专利摘要:
APPLIANCE FOR PROVIDING A UPMIX SIGNAL REPRESENTATION BASED ON DOWNMIX SIGNAL REPRESENTATION, APPLIANCE FOR 5 PROVIDING A BIT FLOW THAT REPRESENTS A MULTI-CHANNEL AUDIO SIGNAL, METHODS, COMPUTER PROGRAMS AND BIT FLOW REPRESENTING A SIGNATURE USING A SIGNAL A LINEAR COMBINATION PARAMETER. A device for providing an upmix signal representation based on a downmix signal representation and object-related parametric information, which are included in a bitstream representation of an audio content, regardless of an interpretation matrix specified by the user, the apparatus comprises a distortion limiter configured to obtain a modified interpretation matrix using a linear combination of an interpretation matrix specified by the user in a target interpretation matrix depending on a linear combination parameter. The apparatus also comprises a signal processor configured to obtain an upmix signal representation based on the downmix signal representation and the parametric information related to the object using the modified interpretation matrix. The device is also configured to evaluate a bit stream element that represents the linear combination parameter in order to (...).
公开号:BR112012012097B1
申请号:R112012012097-2
申请日:2010-11-16
公开日:2021-01-05
发明作者:Jonas Engdegard;Heiko Purnhagen;Juergen Herre;Cornelia FALCH;Oliver Hellmuth;Leonid Terentiev
申请人:Fraunhofer - Gesellschaft Zur Foerderung Der Angewandten Ten Forschung E.V.;Dolby International Ab;
IPC主号:
专利说明:

TECHNICAL FIELD
The realizations according to the invention refer to an apparatus for providing an upmix signal representation based on a downmix signal representation and an object related parametric information, which are included in a bit stream representation of a audio content and depending on an interpretation matrix specified by the user.
Other embodiments, according to the invention, relate to an apparatus for providing a bit stream representing a multichannel audio signal.
Other embodiments, according to the invention, relate to a method for providing an upmix signal representation based on a downmix signal representation and an object related parametric information that are included in a bit stream representation of the content of depending on an interpretation matrix specified by the user.
Other embodiments, according to the invention, relate to a method for providing a bit stream representing a multichannel audio signal.
Other embodiments, according to the invention, refer to a computer program that performs one of said methods.
Another embodiment, according to the invention, relates to a bit stream that represents a multichannel audio signal. BACKGROUND OF THE INVENTION
In the technique of audio processing, audio transmission and audio storage, there is a growing desire to manipulate multichannel content in order to improve auditory impression. The use of multichannel audio content brings significant improvements for the user. For example, a three-dimensional hearing impression can be achieved, which brings improved user satisfaction in entertainment applications. However, multichannel audio content is also useful in professional environments, for example, teleconferencing applications, because the speaker's intelligibility can be improved by using multichannel audio phonographic recording.
However, it is also desirable to have a good trade-off between audio quality and bit rate requirements in order to avoid excessive resource consumption in low-cost or professional multi-channel applications.
Parametric techniques for the transmission and / or efficient storage in terms of the bit rate of audio scenarios that contain multiple audio objects have recently been proposed. For example, a binaural indicator encoding, which is described, for example, in reference [1], and a parametric union encoding of audio sources, which is described, for example, in reference [2], have been proposed. Also, an MPEG spatial audio object encoding (SAOC) has been proposed, which is described, for example, in references [3] and [4]. The MPEG spatial audio object encoding is currently in standardization and described in the unpublished reference [5].
These techniques aim at the perceptual reconstruction of the desired output scenario rather than by a waveform match.
However, in combination with user interactivity on the receiving side, these techniques can lead to poor audio quality of the output audio signals if extreme object interpretation is performed. This is described, for example, in reference [6].
In the following, these systems will be described and it should be noted that the basic concepts also apply to the realizations of the invention.
Figure 8 presents an overview of the system of that system (here: SAOC MPEG). The SAOC MPEG 800 system shown in Figure 8 comprises a SAOC 810 encoder and a SAOC 820 decoder. The SAOC 810 encoder receives a plurality of object signals Xi to xN, which can be represented, for example, as signals time domain or as time frequency domain signals (for example, in the form of a set of transformation coefficients of a Fourier transform or in the form of QMF subband signals). The SAOC 810 encoder typically also receives downmix coefficients from d to dN, which are associated with object signals Xi to xN. Separate sets of downmix coefficients may be available for each channel of the downmix signal. The SAOC 810 encoder is typically configured to obtain a channel of the downmix signal by combining the object signals Xi to xN according to the associated downmix coefficients dT to dN. Typically, there are fewer downmix channels than object signals Xi to xN. In order to allow (at least approximately) a separation (or separate treatment) of the object signals on the SAOC 820 decoder side, the SAOC 810 encoder provides one or more downmix signals (referred to as downmix channels) 812 and information parallel 814. Parallel information 814 describes characteristics of object signals Xi to xN, in order to allow specific object processing on the decoder side.
The SAOC decoder 820 is configured to receive one or more downmix signals 812 and parallel information 814. Also, the SAOC decoder 820 is typically configured to receive user interaction information and / or user control information 822 , which describes a desired interpretation setting. For example, user interaction information / user control information 822 can describe a speaker configuration and the desired spatial placement of objects that provide object signals Xi to xN.
The SAOC 820 decoder is configured to provide, for example, a plurality of upmix channel signals decoded to yM. The upmix channel signals can, for example, be associated with individual speakers in a multi-speaker interpretation arrangement. The SAOC decoder 820 can, for example, comprise an object separator 820a, which is configured to reconstruct, at least approximately, the object signals Xi to xN based on one or more downmix signals 812 and the parallel information 814, obtaining thus, reconstructed object signals 820b. However, the reconstructed object signals 820b may deviate somewhat from the original object signals Xi to xN, for example, because the parallel information 814 is not more than sufficient for a perfect reconstruction due to the bit rate limitations. The SAOC decoder 820 can further comprise a mixer 820c, which can be configured to receive the reconstructed object signals 820b and the user interaction information / user control information 822, and to provide, based on that, the control signals. upmix channel y1 to yM. The mixer 820 can be configured to use user interaction information / user control information 822 to determine the contribution of the individual reconstructed object signals 820b to the upmix channel signals yT to yM. User interaction information / user control information 822 can, for example, comprise interpretation parameters (also referred to as interpretation coefficients), which determine the contribution of the individual reconstructed object signals 822 to the upmix channel signals y1 a yM.
However, it should be noted that in many embodiments, object separation, which is indicated by object separator 820a in Figure 8, and mixing, which is indicated by mixer 820c in Figure 8, is performed in a single step. For this purpose, general parameters can be computed that describe a direct mapping of the one or more downmix signals 812 to the channel signals upmix y1 to yM. These parameters can be computed based on the parallel information and the user interaction information / user control information 820.
Referring now to Figures 9a, 9b and 9c, different devices for obtaining an upmix signal representation based on a downmix signal representation and parallel information related to the object will be described. Figure 9a shows a schematic block diagram of a SAOC MPEG 900 system comprising a SAOC 920 decoder. The SAOC 920 decoder comprises, as separate functional blocks, an object decoder 922 and a mixer / interpreter 926. The decoder of object 922 provides a plurality of reconstructed object signals 924 depending on the representation of the downmix signal (for example, in the form of one or more downmix signals represented in the time domain or in the time frequency domain) and parallel information related to the object (for example, in the form of object metadata). The mixer / interpreter 924 receives the reconstructed object signals 924 associated with a plurality of N objects and provides, based on this, one or more upmix channel signals 928. In the SAOC decoder 920, the extraction of the object signals 924 is performed separately from the mixing / interpretation that allows a separation of the decoding functionality of the object from the mixing / interpretation functionality, but brings with it a relatively high computational complexity.
Referring now to Figure 9b, another SAOC MPEG 930 system will be briefly discussed, comprising a SAOC 950 decoder. The SAOC 950 decoder provides a plurality of upmix 958 channel signals depending on a downmix signal representation (for example, example, in the form of one or more downmix signals) and parallel information related to the object (for example, in the form of object metadata). The SAOC 950 decoder comprises a combined object decoder and mixer / interpreter, which is configured to obtain the upmix 958 channel signals in a joint mixing process without a separation of object decoding and mixing / interpreting, in which the parameters for said upmix joining process are dependent on both the parallel information related to the object and the interpretation information. The upmix joining process also depends on the downmix information, which is considered to be part of the parallel information related to the object.
To summarize the above, provision of the upmix channel signals 928, 958 can be performed in a one-step process or a two-step process.
Referring now to Figure 9c, a SAOC MPEG 960 system will be described. The SAOC 960 system comprises a SAOC for the MPEG Surround 980 transcoder, instead of a SAOC decoder.
The SAOC to the MPEG Surround transcoder comprises a 982 parallel information transcoder, which is configured to receive parallel information related to the object (for example, in the form of object metadata) and, optionally, information about one or more downmix signals and the interpretation information. The parallel information transcoder is also configured to provide parallel MPEG Surround information (for example, in the form of an MPEG Surround bit stream) based on received data. Likewise, the parallel information transcoder 982 is configured to transform a parallel (parametric) information related to the object, which is received from the channel encoder, into a parallel (parametric) information related to the channel, taking into account the interpretation information. and, optionally, information about the content of the one or more downmix signals.
Optionally, the SAOC for the MPEG Surround 980 transcoder can be configured to manipulate the one or more downmix signals, described, for example, by the downmix signal representation, to obtain a manipulated downmix signal representation 988. However, the signal handler downmix 986 can be omitted so that an output downmix representation of SAOC 988 to the MPEG Surround 980 transcoder is identical to the representation of an SAOC input downmix signal to the MPEG Surround transcoder. The downmix signal manipulator 986 can, for example, be used if parallel information related to the MPEG Surround 984 channel is not allowed to provide a desired auditory impression based on the representation of the SAOC input downmix signal to the MPEG Surround 980 transcoder , which may be the case in some constellations of interpretations.
Likewise, the SAOC to the MPEG Surround 980 transcoder provides a downmix signal representation 988 and the MPEG Surround 984 bit stream so that a plurality of upmix channel signals, representing the audio objects according to the input interpretation information to the SAOC to the MPEG Surround 980 transcoder can be generated using an MPEG Surround decoder that receives the MPEG Surround 984 bit stream and a downmix 988 signal representation.
To summarize the above, different concepts for decoding audio signals encoded by SAOC can be used. In some cases, a SAOC decoder is used, which provides upmix channel signals (for example, upmix channel signals 928, 958) depending on the representation of the downmix signal and the parametric parallel information related to the object. Examples for this concept can be seen in Figures 9a and 9b. Alternatively, SAOC-encoded audio information can be transcoded to obtain a downmix signal representation (for example, a 988 downmix signal representation) and parallel channel-related information (for example, the channel-related bit stream MPEG Surround 984), which can be used by an MPEG Surround decoder to provide the desired upmix channel signals.
In the SAOC MPEG 800 system, an overview of the system that is given in Figure 8, the general processing is performed in a frequency selective manner and can be described as follows within each frequency range: • N audio object signals input Xi to xN are downmixed as part of the SAOC encoder processing. For a mono downmix, the downmix coefficients are denoted by di to dN. In addition, the SAOC 810 encoder extracts parallel information 814 that describes the characteristics of the incoming audio objects. For SAOC MPEG, the relations of object energies in relation to each other are the most basic form of parallel information. • Downmix signal (or signals) 812 and parallel information 814 are transmitted and / or stored. For this purpose, the downmix audio signal can be compressed using well-known perceptual audio encoders, such as MPEG — 1 Layer II or II (also known as ".mp3"), MPEG Advanced Audio Coding (AAC) or any another audio encoder. • At the receiving end, the SAOC 820 decoder conceptually attempts to re-store the original object signal ("object separation") using the transmitted parallel information 814 (and, of course, the one or more downmix signals 812). These approximate object signals (also referred to as 820b reconstructed object signals) are then mixed in a target scenario represented by M audio output channels (which can, for example, be represented by the upmix channel signals yi to yM) using a interpretation matrix. For a mono output, the coefficients of the interpretation matrix are given by rT to rN • Effectively, the separation of object signals is rarely performed (or even never performed), since both the separation step (indicated by the separator) of object 820a) as well as the mixing step (indicated by the mixer 820c) are combined in a single transcoding step, which generally results in a huge reduction in computational complexity.
It has been found that a scheme is tremendously efficient, both in terms of the bit rate of transmission (it is only necessary to transmit a few downmix channels plus some parallel information instead of N different object audio signals or a different system) as well as in complexity computational (processing complexity refers mainly to the number of output channels instead of the number of audio objects). Additional benefits for the user over the receiving end include the freedom to choose an interpretation setting of their choice (mono, stereo, surround, virtualized headset phonograph reproduction, and so on) and the user interactivity aspect: the interpretation matrix and, thus, the exit scenario, can be adjusted and changed interactively by the user, according to the will, personal preference or other criteria. For example, it is possible to locate the speakers of a group together in a spatial area to maximize the differentiation from the other remaining speakers. This interactivity is achieved by providing a decoder user interface:
For each transmitted sound object, its relative level and (for non-mono interpretation) the spatial position of the interpretation can be adjusted. This can happen in real time, as the user changes the position of the cursors of the associated graphical user interface (GUI) (for example: object level = + 5dB, object position = -30deg).
However, it was found that the choice of the parameter decoder side for the provision of the upmix signal representation (for example, the upmix channel signals yT to yM) brings audible degradation in some cases.
In view of this situation, it is the objective of the present invention to create a concept that allows the reduction or even the prevention of audible distortion by providing a representation of an upmix signal (for example, in the form of channel signals upmix yi to YM). SUMMARY OF THE INVENTION
One embodiment, according to the invention, creates an apparatus for providing an upmix signal representation based on a downmix signal representation and object-related parametric information, which are included in a bitstream representation of an audio content , and depending on an interpretation matrix specified by the user. The apparatus comprises a distortion limiter configured to obtain a modified interpretation matrix using a linear combination of a user-specified interpretation matrix and a target interpretation matrix depending on a linear combination parameter. The apparatus also comprises a signal processor configured to obtain the upmix signal representation based on the downmix signal representation and the parametric information related to the object using the modified interpretation matrix. The apparatus is configured to evaluate a bit stream element that represents the linear combination parameter in order to obtain the linear combination parameter.
This realization, according to the invention, is based on the main idea that audible distortions of the upmix signal representation can be reduced or even avoided with low computational complexity by performing a linear combination of a user-specified interpretation matrix and the interpretation matrix target depending on a linear combination parameter, which is extracted from the bitstream representation of the audio content, because a linear combination can be performed efficiently and because the execution of the demand task of determining the linear combination parameter can be performed on the side of the audio signal encoder, where there is typically more computational energy available than on the side of the audio signal decoder (apparatus to provide an upmix signal representation).
Likewise, the concept discussed above allows to obtain a modified interpretation matrix, which results in reduced audible distortions even in an inadequate choice of the interpretation matrix specified by the user, without adding any significant complexity to the device to provide an upmix signal representation. . In particular, it may even be unnecessary to modify the signal processor when compared to a device without a distortion limiter, as the modified interpretation matrix constitutes an input quantity to the signal processor and merely replaces the interpretation matrix specified by the user. In addition, the inventive concept has the advantage that an audio signal encoder can adjust the distortion limitation scheme, which is applied on the audio signal decoder side, according to the requirements specified on the encoder side when simply adjust the linear combination parameter, which is included in the bitstream representation of the audio content. Likewise, the audio signal encoder can gradually provide more or less freedom in choosing the decoder user's interpretation matrix (device to provide an upmix signal representation) by properly choosing the linear combination parameter. This allows the adaptation of the audio signal decoder to the user's expectations for a given service, since for the same services a user can expect maximum quality (which implies reducing the possibility of the user arbitrarily adjusting the interpretation matrix), while for other services, the user can typically expect a maximum degree of freedom (which implies increasing the impact of the user's specific interpretation matrix on the result of the linear combination).
To summarize the above, the inventive concept combines high computational efficiency on the decoder side, which can be particularly important for portable audio decoders, with the possibility of a simple implementation, without bringing the need to modify the signal processor and also provides a high degree of control for an audio signal encoder, which can be important to meet user expectations for different types of audio services.
In a preferred embodiment, the distortion limiter is configured to obtain the target interpretation matrix so that a target interpretation matrix is a distortion-free target interpretation matrix. This brings the possibility of having a phonographic reproduction scenario in which there are no distortions or at least hardly any distortions caused by the choice of the interpretation matrix. Also, it has been found that the computation of a distortion-free target interpretation matrix can be performed in a very simple way in some cases. Furthermore, it has been found that an interpretation matrix, which is chosen from a user-specified interpretation matrix and a distortion-free target interpretation matrix, typically results in a good auditory impression.
In a preferred embodiment, the distortion limiter is configured to obtain the target interpretation matrix so that a target interpretation matrix is a downmix-like target interpretation matrix. It was found that the use of a target interpretation matrix similar to the downmix brings a very low or even minimal degree of distortions. Also, this a target interpretation matrix similar to the downmix can be obtained with very low computational effort, because the target interpretation matrix similar to the downmix can be obtained by scaling the inputs of the downmix matrix with a common scale factor and adding some entries additional zero.
In a preferred embodiment, the distortion limiter is configured to scale an extended downmix matrix using an energy normalization scalar, to obtain the target interpretation matrix, where the extended downmix matrix is an extended version of the downmix matrix ( a row of that downmix array describes contributions from a plurality of audio object signals to one or more channels of the downmix signal representation), extended by rows of zero elements, so that several rows of the extended downmix array are identical to one constellation of interpretations described by the interpretation matrix specified by the user. Thus, the extended downmix matrix is obtained using a copy of values from the downmix matrix in the extended downmix matrix, a zero matrix input addition and a scalar multiplication of all matrix elements with the same scalar of energy normalization. All of these operations can be performed very efficiently, so that a target interpretation matrix can be obtained quickly, even in a very simple audio decoder.
In a preferred embodiment, the distortion limiter is configured to obtain the target interpretation matrix so that a target interpretation matrix is a best performing target interpretation matrix. Although this approach computationally has a little more demand than the use of a target interpretation matrix similar to the downmix, the use of a best performing target interpretation matrix provides a better consideration of a desired user interpretation scenario. Using the best performing target interpretation matrix, a user definition of the desired interpretation matrix is taken into account when determining the target interpretation matrix as much as possible without introducing significant distortions or distortions. In particular, the best performance interpretation matrix takes into account the user's desired loudness for a plurality of speakers (or channels of the upmix signal representation). Likewise, an improved auditory impression can result in the use of the best performing interpretation matrix.
In a preferred embodiment, the distortion limiter is configured to obtain the target interpretation matrix so that a target interpretation matrix depends on a downmix matrix and the user-specific interpretation matrix. Likewise, the target interpretation matrix is relatively close to the user's expectations, but still provides a substantially distortion-free audio interpretation. Thus, the linear combination parameter determines a trade-off between an approximation of the user's desired interpretation and minimization of audible distortions, in which the consideration of the interpretation matrix specified by the user for computing the target interpretation matrix provides a good satisfaction of the user's wishes. user, even if the linear combination parameter indicates that the target interpretation matrix must dominate the linear combination.
In a preferred embodiment, the distortion limiter is configured to compute a matrix comprising individual normalization values per channel for a plurality of audio output channels from the apparatus to provide an upmix signal representation, so that an energy normalization value for a given output channel of the apparatus described, at least approximately, a ratio between a sum of energy interpretation values associated with the given output channel in the interpretation matrix specified by the user for a plurality of audio objects, and a sum of downmix energy values for a plurality of audio objects. Likewise, a user's expectation regarding the loudness of the device's different output channels can be met to some degree.
In this case, the distortion limiter is configured to scale a set of downmix values using an individual energy normalization value per associated channel, to obtain a set of interpretation values from the target interpretation matrix associated with the given output channel. Likewise, the relative contribution of a given audio object to a device's output channel is identical to the relative contribution of a given audio object to the representation of the downmix signal, which allows to substantially avoid the audible distortions that would be caused by a modification of the relative contributions of audio objects. Likewise, each of the device's output channels is substantially undistorted. Nevertheless, the user's expectation regarding a loudness distribution over a plurality of speakers (or channels of the upmix signal representation) is taken into account, although details on where to place this audio object and / or how to change intensities relative values of the audio objects in relation to each other are not considered (at least to some degree) in order to avoid distortions that would possibly be caused by an excessively accurate spatial separation of the audio objects or an excessive modification of the relative intensities of the audio objects .
Thus, the evaluation of the proportion between a sum of the energy interpretation values (for example, squares of magnitude interpretation values) associated with a given output channel in the interpretation matrix specified by the user for a plurality of audio objects and a summing downmix energy values for a plurality of audio objects allows to consider all output audio channels, even if the downmix signal representation may comprise fewer channels, while still avoiding distortions that would be caused by a spatial redistribution of the objects of audio or by excessive alteration of the relative loudness of the different audio objects.
In a preferred embodiment, the distortion limiter is configured to compute a matrix that describes an individual energy normalization per channel for a plurality of audio output channels from the apparatus to provide an upmix signal representation depending on the interpretation matrix specified by user and a downmix array. In this case, the distortion limiter is configured to apply the matrix that describes the normalization of individual energy per channel to obtain a set of interpretation coefficients of the target interpretation matrix associated with the given output channel of the device as a linear combination of sets of downmix values (that is, values that describe a scaling applied to the audio signals of different audio objects to obtain a downmix signal channel) associated with the different channels of the downmix signal representation. Using this concept, a target interpretation matrix, which is well adapted to the desired user-specified interpretation matrix, can be obtained even if the downmix signal representation comprises more than one audio channel, while still substantially avoiding distortions. It has been found that the formation of a linear combination of sets of downmix values results in a set of interpretation coefficients that typically cause only small audible distortions. Nevertheless, it was found that it is possible to approach an expectation of the user using this an approach to derive the target interpretation matrix.
In a preferred embodiment, the apparatus is configured to read an index value that represents the linear combination parameter of the bitstream representation of the audio content, and to map the index value on the linear combination parameter using a parameter quantization table. . It was found that this is a computationally efficient concept in a particular way to derive the linear combination parameter. It was also found that this approach brings a better trade-off between user satisfaction and computational complexity when compared to other possible concepts in which complicated computations, instead of evaluating a one-dimensional mapping table, are performed.
In a preferred embodiment, the quantification table describes a non-uniform quantification, in which lower values of the linear combination parameter, which describe a stronger contribution from the interpretation matrix specified by the user in the modified interpretation matrix, are quantified with comparatively high resolution. and higher values of the linear combination parameter, which describe a smaller contribution from the interpretation matrix specified by the user in the modified interpretation matrix, are quantified with comparatively lower resolution. It has been found that in many cases, only extreme configurations of the interpretation matrix bring significant audible distortions. Likewise, it was found that a good adjustment of the linear combination parameter is more important in the region of a stronger contribution of the interpretation matrix specified by the user in the target interpretation matrix, in order to obtain a configuration that allows an ideal compensation between fulfilling an expectation of user interpretation and minimizing audible distortions.
In a preferred embodiment, the apparatus is configured to evaluate a bit stream element that describes a distortion limiting mode. In that case, the distortion limiter is preferably configured to selectively obtain the target interpretation matrix so that a target interpretation matrix is a downmix-like target interpretation matrix or so that a target interpretation matrix is a target interpretation matrix. the best effort. This interchangeable concept has been found to provide an efficient possibility to achieve a good trade-off between meeting a user's expectations for interpretation and minimizing audible distortions for a wide number of different audio pieces. This concept also allows good control of an audio signal encoder over the actual interpretation on the decoder side. Consequently, the requirements for a wide variety of different audio services can be met.
Another embodiment, according to the invention, creates an apparatus for providing a bit stream that represents a multichannel audio signal.
The apparatus comprises a downmixer configured to provide a downmix signal based on a plurality of audio object signals. The apparatus also comprises a parallel information provider configured to provide parametric parallel information related to the object, which describes characteristics of the audio object signals and downmix parameters, and a linear combination parameter that describes contributions from an interpretation matrix specified by the user and a target interpretation matrix to a modified interpretation matrix. The apparatus for providing a bit stream also comprises a bit stream formatter configured for 22/79 to provide a bit stream comprising a representation of the downmix signal, the parametric parallel information related to the object and the linear combination parameter.
This apparatus for providing a bit stream that represents a multichannel audio signal is well suited for cooperation with the apparatus to provide an upmix signal representation discussed above. The device for providing a bit stream that represents a multichannel audio signal allows you to provide the linear combination parameter depending on your knowledge of the audio object signals. Likewise, the audio encoder (that is, the apparatus for providing a bit stream that represents a multichannel audio signal) can have a strong impact on the quality of the interpretation provided by an audio decoder (that is, the apparatus to provide an upmix signal representation discussed above) that evaluates the linear combination parameter. Thus, the device to provide the bit stream that represents a multichannel audio signal has a very high level of control over the result of the interpretation, which provides for improved user satisfaction in many different scenarios. Likewise, it is a fact that the audio encoder of a service provider that provides guidance, using the linear combination parameter, whether or not the user should be allowed to use extreme interpretation settings at the risk of audible distortions. Thus, the user's disappointment, along with the corresponding negative economic consequences, can be avoided by using the audio encoder described above.
Another embodiment, according to the invention, creates a method to provide an upmix signal representation based on a 23/79 downmix signal representation and object related parameter information, which are included in a bit stream representation of the audio content, depending on an interpretation matrix specified by the user. This method is based on the same main idea as that of the device described above.
Another method, according to the invention, creates a method for providing a bit stream that represents a multichannel audio signal. Said method is based on the same discovery as that of the apparatus described above.
Another embodiment, in accordance with the invention, creates a computer program for carrying out the above methods.
Another embodiment, according to the invention, creates a bit stream that represents a multichannel audio signal. The bit stream comprises a representation of a downmix signal combining audio signals from a plurality of audio objects in a parametric parallel information related to the object that describes characteristics of the audio objects. The bit stream also comprises a linear combination parameter that describes contributions from a user-specified interpretation matrix and a target interpretation matrix to a modified interpretation matrix. Said bit stream allows some degree of control over the decoder side interpretation parameters of the audio signal encoder side. BRIEF DESCRIPTION OF THE FIGURES
The embodiments in accordance with the present invention will subsequently be described with reference to the attached figures, in which: Figure 1 shows a schematic block diagram of an apparatus for providing an upmix signal representation, in accordance with an embodiment of the invention. ; Figure 1b shows a schematic block diagram of an apparatus for providing a bit stream representing a multichannel audio signal, in accordance with an embodiment of the invention; Figure 2 shows a schematic block diagram of an apparatus for providing an upmix signal representation, according to another embodiment of the invention; Figure 3a shows a schematic representation of a bit stream representing a multichannel audio signal, according to an embodiment of the invention; Figure 3b shows a detailed syntax representation of specific SAOC configuration information, in accordance with an embodiment of the invention; Figure 3c shows a detailed syntax representation of SAOC structure information, according to an embodiment of the invention; Figure 3d presents a schematic representation of an encoding of a distortion control mode in a bit stream element "bsDcuMode" that can be used in a SAOC bit stream; Figure 3e presents a table representation of an association between an idx bitstream index and a value of a linear combination parameter "DcuParam [idx]", which can be used to encode linear combination information in a stream. SAOC bits; Figure 4 shows a schematic block diagram of an apparatus for providing an upmix signal representation, in accordance with another embodiment of the invention; Figure 5a shows a syntax representation of SAOC-specific configuration information, according to an embodiment of the invention; Figure 5b presents a table representation of an association between a bitstream index idx and a linear combination parameter Param [idx] that can be used to encode the linear combination parameter in a SAOC bitstream; Figure 6a presents a table that describes hearing test conditions; Figure 6b presents a table that describes audio items from the hearing tests; Figure 6c presents a table that describes tested downmix / interpretation conditions for a stereo to stereo SAOC encoding scenario; Figure 7 presents a graphical representation of the distortion control unit (DCU) hearing test results for a stereo to stereo SAOC scenario; Figure 8 shows a schematic block diagram of a reference SAOC MPEG system; Figure 9a shows a schematic block diagram of a reference SAOC system using a separate decoder and mixer; Figure 9b shows a schematic block diagram of a reference SAOC system using an integrated decoder and mixer; and Figure 9c shows a schematic block diagram of a reference SAOC system using a SAOC to MPEG transcoder. DETAILED DESCRIPTION OF THE ACHIEVEMENTS 1. APPARATUS FOR PROVIDING A UPMIX SIGNAL REPRESENTATION, ACCORDING TO FIGURE la Figure la presents a schematic block diagram of an apparatus to provide an upmix signal representation, in accordance with an embodiment of the invention. Device 100 is configured to receive a downmix signal representation 110 and parametric information related to object 112. Device 100 is also configured to receive a linear combination parameter 114. Downmix signal representation 110, parametric information related to object 112 and the linear combination parameter 114 are all included in a bitstream representation of an audio content. For example, the linear combination parameter 114 is described by a bit stream element within said bit stream representation. The apparatus 100 is also configured to receive interpretation information 120, which defines an interpretation matrix specified by the user. The apparatus 100 is configured to provide an upmix signal representation 130, for example, individual channel signals or an MPEG surround signal downmix in combination with parallel MPEG surround information. Apparatus 100 comprises a distortion limiter 140 which is configured to obtain a modified interpretation matrix 142 using a linear combination of xinv a user-specified interpretation matrix 144 (which is described, directly or indirectly, by interpretation information 120) and a target interpretation matrix depending on a linear combination parameter 146, which can, for example, be designated with gDCU •
The apparatus 100 can, for example, be configured to evaluate a bit stream element 114 that represents a linear combination parameter 146 in order to obtain the linear combination parameter.
Apparatus 100 also comprises a signal processor 148 which is configured to obtain the upmix signal representation 130 based on the downmix signal representation 110 and the parametric information related to object 112 using the modified interpretation matrix 142.
Likewise, apparatus 100 is capable of providing an upmix signal representation with good interpretation quality using, for example, a SAOC 148 signal processor, or any other signal processor related to object 148. The modified interpretation matrix 142 is adapted by the distortion limiter 140 so that a sufficiently good hearing impression with sufficiently small distortions is achieved, in most or in all cases. The modified interpretation matrix is typically "intermediate" to the user-specified (desired) interpretation matrix and the target interpretation matrix, in which the degree of similarity of the modified interpretation matrix to the user-specified interpretation matrix and to the matrix of interpretation target interpretation is determined by the linear combination parameter, which consequently allows an adjustment of an achievable interpretation quality and / or a maximum distortion level of the upmix signal representation 130. The signal processor 148 can, for example, be a processor of SAOC signal. Likewise, signal processor 148 can be configured to evaluate parametric information related to object 112 to obtain parameters that describe characteristics of audio objects represented, in a downmixed form, by the representation of downmix signal 110. In addition, the processor signal 148 can obtain (for example, receive) parameters that describe the downmix procedure, which is used on the side of an audio encoder providing the bit stream representation of the audio content in order to derive the downmix signal representation 110 when combining the audio object signals from a plurality of audio objects. Thus, signal processor 148 can, for example, evaluate a level difference information OLD per object that describes a level difference between a plurality of audio objects for a given audio structure and one or more frequency bands, and an IOC inter-object correlation information that describes a correlation between audio signals from a plurality of pairs of audio objects for a given audio structure and for one or more frequency bands. In addition, signal processor 148 can also evaluate DMG, DCLD downmix information that describes a downmix, which is performed on the side of an audio encoder that provides bitstream representation of the audio content, for example, in the form of one or more DMG downmix gain parameters and one or more DCLD downmix channel level difference parameters.
In addition, the signal processor 148 receives the modified interpretation matrix 142, which indicates which audio channels of the upmix signal representation 130 must comprise an audio content of the different audio objects. Likewise, signal processor 148 is configured to determine the contributions of different audio objects to the representation of downmix signal 110 using this knowledge (obtained from OLD information and IOC information) of the audio objects as well as their knowledge downmix process (obtained from DMG information and DCLD information). In addition, the signal processor provides the upmix signal representation so that a modified interpretation matrix 142 is considered.
Likewise, signal processor 148 serves the functionality of the SAOC 820 decoder, in which the downmix signal representation 110 takes the place of the one or more downmix signals 812, in which the parametric information related to object 112 takes the place parallel information 814, and in which the modified interpretation matrix 142 takes the place of user interaction / control information 822. The Jia channel signals take the role of the upmix signal representation 130. Likewise, reference is made to the description of the SAOC 820 decoder.
Similarly, signal processor 148 can take the role of decoder / mixer 920, in which the downmix signal representation 110 takes the role of one or more downmix signals, in which the parametric information related to object 112 takes the role of the metadata of object, in which the modified interpretation matrix 142 takes the role of the interpretation information inserted into the mixer / interpreter 926, and in which the channel signal 928 takes the role of the upmix signal representation 130.
Alternatively, signal processor 148 can perform the functionality of the integrated decoder and mixer 950, in which the downmix signal representation 110 can take the role of one or more downmix signals, in which the parametric information related to object 112 can take the role of object metadata, in which the modified interpretation matrix 142 can take the role of the interpretation information inserted into the object mixer / interpreter 950 decoder, and in which the 958 channel signals can take the role of the signal representation upmix 130.
Alternatively, signal processor 148 can perform the functionality of the SAOC to MPEG Surround 980 transcoder, where the downmix signal representation 110 can take the role of one or more downmix signals, in which the parametric information related to object 112 can take the role of object metadata, where the modified interpretation matrix 142 can take the role of interpretation information, and where the one or more downmix signals 988 in combination with the MPEG Surround 984 bit stream can take the role of upmix 130 signal representation.
Likewise, for details on the functionality of signal processor 148, reference is made to the description of the SAOC 820 decoder, the separate decoder and mixer 920, the integrated decoder and mixer 950 and the SAOC transcoder for MPEG Surround 980. Also reference is made, for example, to documents [3] and [4] regarding the functionality of signal processor 148, in which the modified interpretation matrix 142, instead of the user-specified interpretation matrix 120, takes the role of input interpretation information in the realizations, according to the invention.
Additional details regarding the functionality of the distortion limiter 140 will be described below. 2. APPLIANCE TO PROVIDE A BIT FLOW REPRESENTING A MULTI-CHANNEL AUDIO SIGNAL, ACCORDING TO FIGURE 1b
Figure 1b shows a schematic block diagram of an apparatus 150 for providing a bit stream that represents a multichannel audio signal.
Apparatus 150 is configured to receive a plurality of audio object signals 160a through 160N. The apparatus 150 is further configured to provide a bit stream 170 that represents the multichannel audio signal, which is described by the audio object signals 160a to 160N.
Apparatus 150 comprises a downmixer 180 which is configured to provide a downmix signal 182 based on the plurality of audio object signals 160a to 160N. The apparatus 150 also comprises a parallel information provider 184 which is configured to provide parametric parallel information related to object 186 which describes characteristics of the audio object signals 160a to 160N and downmix parameters used by downmixer 180. The parallel information provider 184 is also configured to provide a linear combination parameter 188 that describes a desired contribution from a user-specified 32/79 interpretation matrix (desired) and a target (low distortion) interpretation matrix to a modified interpretation matrix.
The parametric parallel information related to object 186 may, for example, comprise level difference information per object (OLD) describing level differences per object of the audio object signals 160a to 160N (for example, in a type-like manner) range). The parametric parallel information related to the object can also comprise inter-object correlation information (I0C) that describes correlations between the audio object signals 160a to 160N. In addition, the parametric parallel information related to the object can describe the downmix gain (for example, in an object type way), in which the downmix gain values are used by the downmixer 180 in order to obtain the downmix signal 182 combining the audio object signals 160a to 160N. The parametric parallel information related to object 186 may comprise level difference information per downmix channel (DCLD), which describes the differences between the downmix levels for multiple channels of the downmix signal 182 (for example, if the downmix signal 182 is a multichannel signal). The linear combination parameter 188 can, for example, be a numeric value between 0 and 1, which describes to use only a downmix matrix specified by the user (for example, for a parameter value of 0), only an interpretation matrix target (for example, for a parameter value of 1) or any particular combination of the user-specified interpretation matrix and the intermediate target interpretation matrix of these extremes (for example, for parameter values between 0 and 1).
Apparatus 150 also comprises a bit stream formatter 190 which is configured to provide bit stream 170 so that the bit stream comprises a representation of the downmix signal 182, the parametric parallel information related to object 186 and the combination parameter linear 188.
In the same way, the apparatus 150 performs the functionality of the SAOC 810 encoder, according to Figure 8, or of the channel encoder, according to Figures 9a to 9c. The audio object signals 160a to 160N are equivalent to the xN object signals received, for example, by the SAOC 810 encoder. The downmix signal 182 can, for example, be equivalent to one or more downmix signals 812. The parallel information parametric data related to object 186 can, for example, be equivalent to parallel information 814 or object metadata. However, in addition to said 1 channel downmix signal or a multichannel downmix signal 182 and said parametric parallel information related to object 186, bit stream 170 can also encode linear combination parameter 188.
Likewise, the device 150, which can be considered an audio encoder, has an impact on manipulation on the decoder side of the distortion control scheme, which is performed by the distortion limiter 140, by appropriately configuring the combination parameter. linear 188, so that apparatus 150 expects sufficient quality of interpretation provided by an audio decoder (e.g., apparatus 100) that receives bit stream 170.
For example, the parallel information provider 184 may adjust the linear combination parameter depending on a quality requirement information, which is received from an optional user interface 199 of the device 150. Alternatively or in addition, the provider Parallel information 184 can also take into account the characteristics of the audio object signals 160a to 160N, and the downmixing parameters of the downmixer 180. For example, the apparatus 150 can estimate a degree of distortion, which is obtained in a video decoder. audio under the assumption of one or more interpretation matrices specified by the worst case user and can adjust the linear combination parameter 188 so that an interpretation quality, which is expected to be obtained by the audio signal decoder in consideration of that parameter linear combination, is still considered to be sufficient by the parallel information provider 184. For example, the apparatus 150 can adjust r linear combination parameter 188 to a value that allows a strong impact of the user (influence of the interpretation matrix specified by the user) on the modified interpretation matrix, if the parallel information provider 184 finds that an audio quality of a signal representation upmix would not be seriously degraded even in the presence of extreme user-specified interpretation settings. This may, for example, be the case if the audio object signals 160a to 160N are sufficiently similar. Conversely, the parallel information provider 184 can adjust the linear combination parameter 188 to a value that allows a comparatively small impact from the user (or the user-specified interpretation matrix) if the parallel information provider 184 finds that extreme interpretations could lead to strong audible distortions. This may, for example, be the case if the audio object signals 160a to 160N are significantly different, so that a clear separation of audio objects on the audio decoder side is difficult (or connected to audible distortions).
It should be noted here that the apparatus 15 0 can use the knowledge for the configuration of the linear combination parameter 188 which is only available on the side for the apparatus 150, but not on the side of an audio decoder (for example, the apparatus 100) , such as, for example, desired interpretation quality information entered into the apparatus 150 via a user interface or detailed knowledge about the separate audio objects represented by the audio object signals 160a and 160N.
Likewise, the parallel information provider 184 can provide the linear combination parameter 188 in a very significant way. 3. SAOC SYSTEM WITH DISTORTION CONTROL UNIT (DCU), ACCORDING TO FIGURE 2 3.1 STRUCTURE OF THE SAOC DECODER
In the following, a processing performed by a distortion control unit (DCU processing) will be described with reference to Figure 2, which presents a schematic block diagram of a SAOC 200 system. Specifically, Figure 2 illustrates the unit of DCU distortion control within the overall SAOC system.
Referring to Figure 2, the SAOC 200 decoder is configured to receive a downmix signal representation 210 that represents, for example, a 1 channel signal downmix or a 2 channel signal downmix, or even a downmix signal having more than two channels. The SAOC 200 decoder is configured to receive a bit stream of SAOC 212, which comprises a parametric parallel information related to the object, such as, for example, OLD object difference level information, IOC inter-object correlation information , a DMG downmix gain information and, optionally, a DCLD downmix channel level difference information. The SAOC 200 decoder is also configured to obtain a linear combination parameter 214, which is also designated with 8DCU •
Typically, the downmix signal representation 210, the bit stream of SAOC 212 and the linear combination parameter 214 are included in a bit stream representation of an audio content.
The SAOC 200 decoder is also configured to receive, for example, an interpretation matrix input 220 from a user interface. For example, the SAOC 200 decoder can receive an interpretation matrix input 220 in the form of a matrix Mren, which defines the contribution (specified by the user, desired) of a plurality of No ^ audio objects to 1, 2 or even more channels of outgoing audio signal (from the upmix representation). The Mren interpretation matrix can, for example, be inserted from a user interface, in which the user interface can translate 37/79 a form specified by the user other than the representation of a desired interpretation configuration in parameters of the Mren interpretation matrix . For example, the user interface can translate an input in the form of level cursor values and audio object position information into an interpretation matrix specified by the user Mren using some mapping.
It should be noted here that throughout this description, indices 1 that define a parameter time interval and m that define a processing range are sometimes omitted for clarity. However, it must be kept in mind that processing can be performed individually for a plurality of subsequent parameter time intervals having indices 1 and for a plurality of frequency bands having indices of frequency range m.
The SAOC 200 decoder also comprises a DCU 240 distortion control unit that is configured to receive the interpretation matrix specified by the user Mren, at least part of the SAOC 212 bitstream information (as will be described in detail below) and the linear combination parameter 214. The distortion control unit 240 provides the modified interpretation matrix Mrenlim.
The audio decoder 200 also comprises a SAOC 248 decoding / transcoding unit, which can be considered as a signal processor, and which receives the downmix signal representation 210, the SAOC 212 bit stream and the modified interpretation matrix Mrenlim. The SAOC 248 decoding / transcoding unit provides a representation 230 of one or more output channels, which can be considered as an upmix signal representation. The representation 230 of the one or more output channels can, for example, take the form of a frequency domain representation of the individual audio signal channels, a time domain representation of individual audio channels or a representation of parametric multi-channels. For example, the upmix 230 signal representation takes the form of an MPEG surround representation comprising an MPEG surround downmix signal and a parallel MPEG surround information.
It should be noted that the SAOC 248 decoding / transcoding unit can comprise the same functionality as the signal processor 148, and can be equivalent to the SAOC decoder 820, the separate decoder and mixer 920, the integrated decoder and mixer 950 and the SAOC transcoder for MPEG Surround 980. 3.2 INTRODUCTION TO SAOC DECODER OPERATION
Below, a brief introduction to the operation of the SAOC 200 decoder will be given.
Within the general SAOC system, the distortion control unit (DCU) is incorporated into the SAOC decoder / transcoder processing chain between the interpretation interface (for example, a user interface in which the interpretation matrix specified by the user or information from which the user-specified interpretation matrix can be derived, is inserted) and the actual SAOC decoding / transcoding unit.
The distortion control unit 240 provides a modified interpretation matrix Mren] im using the information from the interpretation interface (for example, the interpretation matrix specified by the user entered, directly or indirectly, through the interpretation interface or the interface of interpretation). user) and SAOC data (for example, SAOC 212 bit stream data). For more details, reference is made to Figure 2. The modified Mren] im interpretation matrix can be accessed by the application (for example, the SAOC 248 decoding / transcoding unit), which reflects the really effective interpretation settings.
Based on the interpretation scenario specified by the user represented by the interpretation matrix (specified by the user) with elements, the DCU avoids extreme interpretation configurations when producing a modified matrix comprising limited interpretation coefficients, which must be used by the SAOC interpretation mechanism. . For all SAOC operating modes, the final interpretation coefficients (processed from DCU) must be calculated according to:

The parameter e [0, l], which is also designated as a linear combination parameter, is used to define the degree of transition from the interpretation matrix specified by the user to the distortion-free target matrix.
The gDCU parameter is derived from the bit stream element "bsDcuParam" according to: 8DCU = DcuParam [bsDcuParam].
Likewise, a linear combination between the interpretation matrix specified by the user Mren and the Mrentor distortion-free target interpretation matrix is formed depending on the linear combination parameter gDCU • The linear combination parameter gDC (/ is derived from an element bitstream, so that there is no difficult computation of the required gDcu linear combination parameter (at least on the decoder side). Also, the derivation of the gDCU linear combination parameter of the bitstream, including the representation of the downmix signal 210, the bit stream of SAOC 212 and the bit stream element that represents the linear combination parameter, gives an audio signal encoder a chance to partially control the distortion control mechanism, which is carried out on the side of the SAOC decoder.
There are two possible versions of the Mzr ^ tar distortion-free target matrix, suitable for different applications. It is controlled by the bit stream element "bsDcuMode": • ("bsDcuMode" = 0): The "downmix-like" interpretation, where Mzre ™ tar corresponds to the normalized energy downmix matrix. • ("bsDcuMode" = 1): The "best effort" interpretation, where M ^ tar is defined as a function of both the downmix interpretation matrix and the one specified by the user.
To summarize, there are two modes of distortion control called the "downmix-like" interpretation and the "best-effort" interpretation, which can be selected according to the "bsDcuMode" bitstream elements. These two modes differ in the way that your target interpretation matrix is computed. Below, details regarding the computation of the target interpretation matrix for the two modes of interpretation "similar to the downmix" and interpretation of the "best effort" will be described in detail. 3.3 INTERPRETATION "LIKE DOWNMIX" 3.3.1 INTRODUCTION
The "downmix-like" interpretation method can typically be used in cases where the downmix is an important reference of high artistic quality. The "downmix-like" interpretation matrix M'en_DS is computed as
where NlDS represents an energy normalization scalar (for each parameter range l) and DZDS is the downmix matrix D1 extended by the rows of zero elements so that the number and order of the DZDS rows correspond to the constellation of M ^.
For example, in stereo transcoding mode from SAOC to multichannel NMPS = 6. Likewise, DZDS is the size of NMPSXN (where N depicts the number of incoming audio objects) and its rows representing the front left and right output channels equal to Dz (or corresponding rows of Dz).
To facilitate the understanding of the above, the following definitions of the interpretation matrix and the downmix matrix must be considered.
The Mren.iim (modified) interpretation matrix applied to the input audio objects S determines the target interpreted output as Y = Mren lim S. The (modified) interpretation matrix Mren, lim with mi elements maps all input objects i (i.e., input objects having object index i) to the desired output channels j (i.e., output channels having channel index j). The interpretation matrix (modified) Mren, lim is given by
output 5.1,
for stereo output configuration,
for mono output configuration.
The same dimensions typically also apply to the interpretation matrix specified by the user Mren and the target interpretation matrix Mren, tar.
The downmix matrix D applied to the input audio objects S (in an audio decoder) determines the downmix signal as X = DS.
For the case of stereo downmix, the downmix D matrix of size 2xN (also designated with D1, to present a possible time dependence) with elements di, j (i = 0, l; j = 0, ..., 2V -f) is obtained (in an audio decoder) from the DMG and DCLD parameters as

For the case of mono downmix, the Dx downmix matrix of size IxN with elements (i = 0; j = 0, ..., 2V-l) is obtained (in an audio decoder) from the DMG parameters as

The DMG and DCLD downmix parameters are obtained from the SAOC 212 bit stream. 3.3.2 ENERGY STANDARDIZATION SCALE COMPUTER FOR ALL DECODING / TRANSCODING SAOC MODES
For all decoding / transcoding SAOC modes, the scaling of NlDS energy normalization is computed using the following equation:
3.4 INTERPRETATION OF "BEST PERFORMANCE" 3.4.1 INTRODUCTION
The "best effort" method of interpretation can typically be used in cases where the target interpretation is an important reference.
The "best effort" interpretation matrix describes a target interpretation matrix, which depends on downmix and interpretation information. The normalization of energy is represented by a matrix of size NMPSXM, therefore, it provides individual values for each output channel. This requires different calculations for the different modes of SAOC operation, which are outlined below. The "best effort" interpretation matrix is computed as
^ renBE “- ^ ren.tar“ A / N ^ D ', for the following SAOC modes "xll / 2/5 / b", "x-2-l / b",
, for the following SAOC modes "x-2-2 / 5".
Here, Dz is the downmix matrix and represents the energy normalization matrix.
The square root operator in the above equation designates an element mode square root formation.
Next, the computation of the NBE value, which can be a scalar of normalization of energy in the case of a decoding mode of mono to mono of SAOC and which can be a matrix of normalization of energy in the case of other modes of decoding or transcoding , will be discussed in detail. 3.4.2 MODE OF DECODING ("x-1-1") FROM MONO TO MONO DE SAOC
For SAOC mode "x-1-1" in which a mono downmix signal is decoded to obtain a mono output signal (as an upmix signal representation), the scalar energy normalization is computed using the following equation
3.4.3 MONO DECODING MODE ("x-1-2") FOR SAOC STEREO
For SAOC mode "x-1-2", in which a mono downmix signal is decoded to obtain a stereo output (two channels) (as an upmix signal representation), the 2x1 size energy normalization matrix is computed using the following equation
3.4.4 MONO DECODING MODE ("xlb")
FOR SAOC BIAURICULAR For "xlb" SAOC mode, in which a mono downmix signal is decoded to obtain a binaurally interpreted output signal (as an upmix signal representation), the 2x1 size energy normalization matrix is computed using the following equation

The a * ™ elements comprise (or are taken in form) the target binaural interpretation matrixAz, m. 3.4.5 MONO DECODING MODE ("x-2-1") FOR SAOC STEREO For SAOC mode "x-2-1", in which a two-channel (stereo) downmix signal is decoded to obtain a channel output signal (mono) (as an upmix signal representation), the 1x2 size energy normalization matrix is computed using the following equation
where is the 1X / V size mono interpretation matrix. 3.4.6 MODE OF DECODING ("x-2-2") FROM STEREO TO SAOC STEREO
For SAOC mode "x-2-2", in which a stereo downmix signal is decoded to obtain a stereo output signal (as an upmix signal representation), the 2x2 size energy normalization matrix is computed using the following equation
where MLn is the 2xN size stereo interpretation matrix. 3.4.7 DECODING MODE ("x-2-b") STEREO FOR SAOC BIAURICULAR
For SAOC mode "x-2-b", in which a stereo downmix signal is decoded to obtain an output signal interpreted binaurally (as an upmix signal representation), the 2x2 size energy normalization matrix is computed using the following equation
where A1 '"1 is a 2xN binaural interpretation matrix. 3.4.8 TRANSCODING MODE (" x-1-5 ") FROM MONO TO SAOC MULTI-CHANNELS
For SAOC mode "x-1-5", in which a mono downmix signal is transcoded to obtain a 5 channel or 6 channel output signal (as an upmix signal representation), the energy normalization matrix of NMPSX1 size is computed using the following equation
3.4.9 STEREO TRANSCODING MODE ("x-2-5") FOR MULTI-CHANNELS OF SAOC
For SAOC mode "x-2-5", in which a stereo downmix signal is transcoded to obtain a 5 channel or 6 channel output signal (as an upmix signal representation), the energy normalization matrix of NMPS'X.'l size is computed using the following equation
3.4.10 J COMPUTING *
To avoid numerical problems when calculating D '(DZ) j in 3.4.5, 3.4.6, 3.4.7 and 3.4.9, Jz and modified in some realizations. First, the eigenvalues of Jl are calculated, solving det (J- ^ 2I) = 0. The eigenvalues are drawn in decreasing order (A ^^ 2) and the eigenvector corresponding to the largest eigenvalue is calculated according to the equation above. It is ensured that it exists in the positive x-plane (the first element must be positive). The second eigenvector is obtained from the first by a rotation of -90 degrees:
3.4.11 APPLICATION OF THE DISTORTION CONTROL UNIT (DCU) FOR ENHANCED AUDIO OBJECTS (EAO)
In the following, some optional extensions in relation to the application of the distortion control unit will be described, which can be implemented in some embodiments, according to the invention.
For SAOC decoders that decode residual encoding data and therefore support the manipulation of EAOs, it may be significant to provide a second parameterization of the DCU that allows you to take advantage of the enhanced audio quality provided by the use of EAOs. This is achieved by decoding and using a second alternate set of DCU parameters (i.e., bsDcuMode2 and bsDcuParam2) which is additionally transmitted as part of the data structures that contain residual data (i.e. SAOCExtensionConfigData () and SAOCExtensionFrameData ()). An application can make use of this second set of parameters if it decodes residual encoding data and operates in the strict EAO EAO mode which is defined by the condition that only EAOs can be modified arbitrarily while all non-EAOs only undergo a single common modification. Specifically, this strict EAO mode requires the following two conditions to be met:
The downmix matrix and the interpretation matrix have the same dimensions (implying that the number of channels interpretation is equal to the number of downmix channels).
The application only employs interpretation coefficients for each of the regular objects (ie, not EAOs) that are related to their corresponding downmix coefficients by a single common scaling factor. 4. BIT FLOW ACCORDING TO FIGURE 3a
In the following, a bit stream representing a multichannel audio signal will be described with reference to Figure 3a which presents a graphical representation of that bit stream 300.
Bit stream 300 comprises a downmix signal representation 302, which is a representation (e.g., a coded representation) of a downmix signal that combines the audio signals from a plurality of audio objects. Bitstream 300 also comprises parametric parallel information related to object 304 that describes characteristics of the audio object and, typically, also characteristics of a downmix performed on an audio encoder. Parametric information related to object 304 preferably comprises OLD object difference level information, I0C inter-object correlation information, DMG downmix gain information and different DCLD downmix channel level information. The bit stream 300 also comprises a linear combination parameter 306 that describes the desired contributions of a user-specified interpretation matrix and a target interpretation matrix to a modified interpretation matrix (to be applied by an audio signal decoder) ).
Additional optional details regarding this bit stream 300, which can be provided by device 150 as bit stream 170, and which can be inserted into device 100 to obtain the representation of downmix signal 110, the parametric information related to the object 112 and linear combination parameter 140, or in device 200 to obtain downmix information 210, SAOC bit stream information 212 and linear combination parameter 214, will be described below with reference to Figures 3b and 3c . 5. DETAILS OF THE BIT FLOW SYNTAX 5.1. SAOC SPECIFIC CONFIGURATION SYNTAX Figure 3b presents a detailed syntax representation of SAOC specific configuration information.
The specific configuration of SAOC 310, according to Figure 3b, can, for example, be part of a bitstream head 300, according to Figure 3a.
The specific SAOC configuration may, for example, comprise a sampling frequency configuration that describes a sampling frequency to be applied by a SAOC decoder. The specific SAOC configuration also comprises a low delay mode configuration which describes whether a low delay mode or high delay mode of the signal processor 148 or the SAOC 248 decode / transcode unit should be used. The specific SAOC configuration also comprises a frequency resolution configuration that describes a frequency resolution to be used by the signal processor 148 or the SAOC 248 decoding / transcoding unit. In addition, the specific SAOC configuration may comprise a configuration structure length describing a length of audio structures to be used by the signal processor 148 or the SAOC 248 decoding / transcoding unit. In addition, the specific SAOC configuration typically comprises a number of object configuration that describes a number of audio objects to be processed by the signal processor 148 or the SAOC 248 decoding / transcoding unit. The number of object setting also describes a number of object-related parameters included in object-related parametric information 112 or in the SAOC 212 bits. The specific SAO setting C can comprise an object relationship configuration, which designates objects that have a common parametric information related to the object. The specific SAOC configuration may also comprise an absolute energy transmission configuration, which indicates whether absolute energy information is transmitted from an audio encoder to an audio decoder. The specific SAOC configuration may also comprise a configuration of the number of downmix channels, which indicates whether there is only one downmix channel, whether there are two downmix channels, or whether there are, optionally, more than two downmix channels. In addition, the specific SAOC configuration may comprise additional configuration information in some realizations.
The specific SAOC configuration can also comprise postprocessing downmix gain configuration information "bsPdgFlag" that defines whether a postprocessing downmix gain for optional postprocessing is transmitted.
The specific SAOC configuration also comprises a "bsDcuFlag" indicator (which can, for example, be a 1-bit indicator), which defines whether the "bsDcuMode" and "bsDcuParam" values are transmitted in the bit stream. If that "bsDcuFlag" indicator has a value of "1", another indicator that is marked "bsDcuMandatory" and a "bsDcuDynamic" indicator are included in the specific SAOC 310 configuration. The "bsDcuMandatory" indicator describes whether the distortion control should be applied by an audio decoder. If the indicator "bsDcuMandatory" is equal to 1, then the distortion control unit must be applied using the parameters "bsDcuMode" and "bsDcuParam", as transmitted in the bit stream IF the indicator "bsDcuMandatory" is equal to "0" , then the parameters of the distortion control unit "bsDcuMode" and "bsDcuParam" transmitted in the bit stream are only recommended values and also other distortion control unit settings could be used.
In other words, an audio encoder can activate the "bsDcuMandatory" indicator in order to force the use of the distortion control mechanism in a standard-compliant audio decoder, and can deactivate said indicator in order to leave the decision up. apply the distortion control unit, and if so, what parameters to use for the distortion control unit, to the audio decoder.
The "bsDcuDynamic" indicator allows dynamic signaling of the "bsDcuMode" and "bsDcuParam" values. If the "bsDcuDynamic" indicator is disabled, the "bsDcuMode" and "bsDcuParam" parameters are included in the specific SAOC configuration and, otherwise, the "bsDcuMode" and "bsDcuParam" parameters are included in the SAOC structures, or at least , in some of the SAOC structures, as will be discussed later. Likewise, an audio signal encoder can switch between signaling at once (per piece of audio comprising a single specific configuration of SAOC and, typically, a plurality of SAOC structures) and a dynamic transmission of said parameters within some or all of the SAOC structures.
The parameter "bsDcuMode" defines the type of distortion-free target matrix for the distortion control unit (DCU), according to the table in Figure 3d.
The parameter "bsDcuParam" defines the parameter value for the distortion control unit (DCU) algorithm, according to the table in Figure 3e. In other words, the 4-bit parameter "bsDcuParam" defines an idx index value, which can be mapped by an audio signal decoder to a linear combination value gr> cu (also referred to as "DcuParam [ind]" or "DcuParam [idx]"). Thus, the parameter "bsDcuParam" represents, in a quantified way, the linear combination parameter.
As can be seen in Figure 3b, the parameters "bsDcuMandatory", "bsDcuDynamic", "bsDcuMode" and "bsDcuParam" are set to a default value of "0", if the "bsDcuFlag" indicator has a value of "0", which indicates that the parameters of the distortion control unit are not transmitted.
The specific SAOC configuration also optionally comprises one or more byte alignment bytes "ByteAlign ()" to bring the specific SAOC configuration to a desired length.
In addition, the specific SAOC configuration can optionally comprise a SAOC extension configuration "SAOCExtensionConfig ()", which comprises additional configuration parameters. However, said configuration parameters are not relevant to the present invention, so that a discussion is omitted here for the sake of brevity. 5.2. SAOC STRUCTURE SYNTAX
In the following, the syntax of a SAOC structure will be described with reference to Figure 3c.
The "SAOCFrame" SAOC structure typically comprises OLD encoded object level difference values, as discussed above, which can be included in the SAOC structure data for a plurality of frequency bands ("band-type") and for a plurality of audio objects (per audio object).
The SAOC structure also optionally comprises NRG encoded absolute energy values that can be included for a plurality of frequency bands (of the band type).
The SAOC framework can also comprise IOC-encoded inter-object correlation values, which are included in the SAOC framework data for a plurality of combinations of audio objects. IOC values are typically included in a band-like manner.
The SAOC structure also comprises DMG encoded downmix gain values, where there is typically a downmix gain value per audio object per SAOC structure.
The SAOC structure also optionally comprises DCLD encoded downmix channel level differences, where there is typically a downmix channel level difference value per audio object and SAOC structure.
Also, the SAOC structure typically optionally comprises PDG encoded post-processing downmix gain values.
In addition, a SAOC structure may also comprise, in some circumstances, one or more distortion control parameters. If the "bsDcuFlag" indicator, which is included in the specific SAOC configuration section, is equal to "1", which indicates the use of the distortion control unit information in the bit stream, and if the "bsDcuDynamic" indicator in the specific SAOC configuration it also has a value of "1", which indicates the use of dynamic distortion control unit information (of the structure type), the distortion control information is included in the SAOC structure, provided that the SAOC structure is a so-called "independent" SAOC structure, for which the "bsIndependencyFlag" indicator is active or that the "bsDcuDynamicüpdate" indicator is active.
It should be noted here that the "bsDcuDynamicüpdate" indicator is only included in the SAOC structure if the "bsIndependencyFlag" indicator is inactive and that the "bsDcuDynamicüpdate" indicator defines whether the "bsDcuMode" and "bsDcuParam" values are updated. More precisely, "bsDcuDynamicüpdate" = = 1 means that the values "bsDcuMode" and "bsDcuParam" are updated in the current structure, while "bsDcuDynamicüpdate" = = 0 means that the previously transmitted values are maintained.
Likewise, the parameters "bsDcuMode" and "bsDcuParam", which were explained above, are included in the SAOC structure if the transmission of the distortion control unit parameters is enabled and a dynamic transmission of the data from the distortion control unit is also activated and the "bsDcuDynamicüpdate" indicator is activated. In addition, the parameters "bsDcuMode" and "bsDcuParam" are also included in the SAOC structure if the SAOC structure is an "independent" SAOC structure, the data transmission from the distortion control unit is enabled and the dynamic transmission of the distortion control unit data is also enabled.
The SAOC structure also optionally comprises fill data "byteAlign ()" to fill the SAOC structure to a desired length.
Optionally, the SAOC framework can comprise additional information, which is referred to as "SAOCExt or ExtensionFrame ()". However, this optional additional SAOC structure information is not relevant to the present invention and, for the sake of brevity, will therefore not be discussed here.
In addition, it should be noted that the "bsIndependencyFlag" indicator indicates whether the lossless encoding of the current SAOC structure is done independently of the previous SAOC structure, that is, whether the current SAOC structure can be decoded without knowledge of the structure of the current SAOC. Previous SAOC. 6. SAOC DECODER / TRANSCODER ACCORDING TO FIGURE 4
In the following, additional realizations of the interpretation coefficient limitation scheme for the control of distortion in SAOC will be described. 6.1. OVERVIEW
Figure 4 shows a schematic block diagram of an audio decoder 400, according to an embodiment of the invention.
The audio decoder 400 is configured to receive a downmix signal 410, a bit stream of SAOC 412, a linear combination parameter 414 (also designated with A) and an interpretation matrix information 420 (also designated with R). The audio decoder 400 is configured to receive an upmix signal representation, for example, in the form of a plurality of output channels 130a to 130M. The audio decoder 400 comprises a distortion control unit 440 (also referred to as DCU) that receives at least a portion of the SAOC bit stream information from the SAOC bit stream 412, the linear combination parameter 414 and information interpretation matrix 420. The distortion control unit provides modified interpretation information Rnm which can be modified interpretation matrix information.
The audio decoder 400 also comprises a SAOC decoder and / or SAOC 448 transcoder, which receives the downmix signal 410, the SAOC 412 bit stream and the modified Rlim interpretation information and provides, based on that, the output channels 130a to 130M.
In the following, the functionality of the audio decoder 400, which uses one or more interpretation coefficient limiting schemes, in accordance with the present invention, will be discussed in detail.
The processing of general SAOC is performed in a time / frequency selective manner and can be described as follows. The SAOC encoder (for example, the SAOC 150 encoder) extracts the psychoacoustic characteristics (for example, object energy relationships and correlations) from various incoming audio object signals and then downmixes them into a mono or stereo (for example, the downmix signal 182 or the downmix signal 410). This downmix signal and the extracted parallel information (for example, object-related parametric parallel information or SAOC 412 bit stream information is transmitted (or stored) in a compressed format using the well-known perceptual audio encoders. Upon receipt, the SAOC 418 decoder conceptually attempts to re-store the original object signals (ie, separate downmixed objects) using the transmitted parallel information 412. These approximate object signals are then mixed in a target scenario using an interpretation matrix. The interpretation matrix, for example, R or Rnm is made up of the Coefficients of Interpretation (RCs) specified for each transmitted audio object and upmix configuration speaker, which determine spatial gains and positions of all the separated / interpreted objects.
Effectively, the separation of object signals is rarely or never performed, since the separation and mixing are performed in a single combined processing step that results in a huge reduction in computational complexity. This scheme is tremendously efficient, both in terms of transmission bit rate (you only need to transmit one or two downmix channels 182, 410 plus some parallel information 186, 188, 412, 414, instead of several individual object audio signals ) as well as computational complexity (processing complexity refers mainly to the number of output channels instead of the number of audio objects). The SAOC decoder transforms (at a parametric level) the object gains and other parallel information directly into the Transcoding Coefficients (TCs) that are applied to the downmix signal 182, 414 to create the corresponding signals 130a to 130M for the audio scenario. interpreted output (or pre-processed downmix signal for an additional decoding operation, that is, typically multi-channel MPEG Surround interpretation).
The subjectively perceived audio quality of the interpreted output scenario can be improved by applying a DCU distortion control unit (for example, an interpretation matrix modification unit), as described in [6]. This improvement can be achieved for the price of accepting a moderate dynamic modification of the target interpretation settings. The modification of the interpretation information can be made varying in time and frequency, which, in specific circumstances, can result in unnatural sound colorings and / or temporal oscillation artifacts.
Within the general SAOC system, the DCU can be incorporated into the SAOC decoder / transcoder processing chain in a straightforward manner. Namely, it is placed at the front end of the SAOC when controlling the RCs R, see Figure 4. 6.2. IMPLIED HYPOTHESIS
The implicit hypothesis of the indirect control method considers a relationship between the level of distortion and deviations of the RCs and their levels of corresponding objects in the downmix. This is based on the observation that the more specific attenuation / stimulus is applied by the RCs to a particular object in relation to the other objects, the more aggressive modification of the transmitted downmix signal must be carried out by the SAOC decoder / transcoder. In other words: the greater deviation from the "object gain" values that are relative to each other, is the greater chance that an unacceptable distortion will occur (assuming identical downmix coefficients). 6.3. CALCULATION OF LIMITED INTERPRETATION COEFFICIENTS
Based on the user-specified interpretation scenario represented by the coefficients (the RCs) of a ch ob size matrix (ie and the rows correspond to the output channels 130a to 130M, the columns to the incoming audio objects), the DCU avoids the extreme interpretation configurations when producing a modified limit matrix comprising limited interpretation coefficients, which are in fact used by the SAOC 448 interpretation mechanism. Without loss of generality, in the subsequent description, the RCs are assumed to be invariant in frequency to simplify the notation. For all SAOC operating modes, the limited interpretation coefficients can be derived as

This means that by incorporating the transition parameter AG [0.1] (also referred to as a linear combination parameter), a combination of the interpretation matrix (specified by the user) R to a target matrix R can be performed. In other words, the limited matrix Rlim represents a linear combination of the interpretation matrix R and a target matrix. On the one hand, the target interpretation matrix could be the downmix matrix (that is, the downmix channels are passed through the transcoder 448) with a normalization factor or another static matrix that results in a static transcoding matrix. This "downmix-like interpretation" ensures that the target interpretation matrix does not introduce any SAOC processing artifacts and, consequently, represents an ideal interpretation point in terms of audio quality despite being totally independent of the initial interpretation coefficients.
However, if an application requires a specific interpretation scenario or a high user adjustment value in its initial interpretation configuration (especially, for example, the spatial position of one or more objects), the downmix-like interpretation fails to serve as target point. On the other hand, this point can be interpreted as "best effort interpretation" when considering both the downmix and initial interpretation coefficients (for example, the interpretation matrix specified by the user). The purpose of this second definition of the target interpretation matrix is to preserve the specific interpretation scenario (for example, defined by the interpretation matrix specified by the user) in the best possible way, but at the same time maintaining audible degradation due to excessive object manipulation in a minimum level. 6.4. INTERPRETATION LIKE DOWNMIX 6.4.1 INTRODUCTION
The D downmix matrix of size Ndm ^ Nob θ determined by the encoder (for example, the audio encoder 150) and comprises information about how the input objects are linearly combined in the downmix signal that is transmitted to the decoder. For example, with a mono downmix signal, D reduces to a single row vector and in the case of stereo downmix = 2.
The "downmix-like" interpretation matrix RDS is computed as
where NDS represents the energy normalization scalar and DR is the downmix matrix extended by the rows of zero elements, so that the number and order of the rows of DR correspond to the constellation of R. For example, in the transcoding mode from stereo to multi-channel SAOC (x-2-5) Ndrm. = 'le Nch = 6. Likewise DR is of size and its rows that represent the front left and right exit channels equal to D. 6.4.2 ALL MODES OF DECODING / TRANSCODING SAOC
For all decoding / transcoding SAOC modes, the scaling of NDS energy normalization can be computed using the following equation
where the trace operator (X) implies the sum of all diagonal elements of the matrix X. The (*) implies the transposed operator of complex conjugate. 6.5. INTERPRETATION OF BEST PERFORMANCE 6.5.1 INTRODUCTION
The best performance interpretation method describes a target interpretation matrix, which depends on downmix and interpretation information. These energy normalization is represented by an NBE matrix of size Nch, so it provides individual values for each output channel (as long as there is more than one output channel). This requires different NBE calculations for the different modes of SAOC operation, which are outlined in the subsequent sections.
The "best effort interpretation" matrix is computed as
where D is the downmix matrix and NBE represents the energy normalization matrix. 6.5.2 MODE OF DECODING ("x-1-1") FROM MONO TO MONO DE SAOC
For SAOC mode "x-1-1", the scalar of NBE energy normalization can be computed using the following equation
6.5.3 MONO DECODING MODE ("x-1-2") FOR SAOC STEREO
For SAOC mode "x-1-2", the 2x1 size NBE energy normalization matrix can be computed using the following equation
6.5.4 DECODING MODE ("xlb") FROM MONO TO SAOC BIAURICULAR
For SAOC mode "xlb", the 2x1 size NBE energy normalization matrix can be computed using the following equation

It should also be noted that here ri and r2 consider / incorporate binaural HRTF parameter information.
It should also be noted that for all 3 equations above, the square root of NBE must be considered, ie
(see previous description). 6.5.5 MODE OF DECODING ("x-2-1") FROM STEREO TO MONO DE SAOC
For SAOC mode "x-2-1", the 1x2 size NBE energy normalization matrix can be computed using the following equation
where the mono interpretation matrix Rr of size lxNob is defined as
6.5.6 MODE OF DECODING ("x-2-2") FROM STEREO TO SAOC STEREO
For SAOC mode "x-2-2", the 2x2 size NBE energy normalization matrix can be computed using the following equation
where the Z 2 stereo interpretation matrix of size 2xNob is defined as
6.5.7 DECODING MODE ("x-2-b") FROM MONO TO SAOC BIAURICULAR
For SAOC mode "x-2-b", the 2x2 size NBE energy normalization matrix can be computed using the following equation
where the 2xNob size 7 2 binaural interpretation matrix is defined as

It should also be noted that here r1 / n and r2 do not consider / incorporate binaural HRTF parameter information. 6.5.8 TRANSCODING MODE ("x-1-5") FROM MONO TO SAOC MULTI-CHANNELS
For SAOC mode "x-1-5", the NBE energy normalization matrix of size NchXl can be computed using the following equation

Again, considering the square root for each element is recommended or even necessary in some cases. 6.5.9 STEREO TRANSCODING MODE ("x-2-5") FOR MULTI-CHANNELS OF SAOC
For SAOC mode "x-2-5", the NBE energy normalization matrix of size Nchx2 can be computed using the following equation
6.5.10 (DD *) "COMPUTER" 1
For the computation of the term [DD * j, regularization methods can be applied to avoid misplaced matrix results. 6.6. CONTROL OF THE INTERPRETATION COEFFICIENT LIMITATION SCHEMES 6.6.1 EXAMPLE OF THE BIT FLOW SYNTAX
Next, a SAOC-specific syntax representation will be described, referring to Figure 5a. The specific SAOC configuration "SAOCSpecificConfig ()" comprises conventional SAOC configuration information. In addition, the specific SAOC configuration comprises a specific addition of DCU 510, which will be described in more detail below. The specific SAOC configuration also comprises one or more fill bits "ByteAlign ()", which can be used to adjust the length of the specific SAOC configuration. In addition, the specific SAOC configuration may optionally comprise a SAOC extension configuration, which comprises additional configuration parameters.
The specific addition of DCU 510, according to Figure 5a, to the bitstream syntax element "SAOCSpecificConfig ()", is an example of bitstream signaling for the proposed DCU scheme. This refers to the syntax described in the subclause "5.1 payloads for SAOC" of the draft SAOC Standard, according to reference [8].
Next, the definition of some of the parameters will be given. "bsDcuFlag" Defines whether the settings for DCU are determined by the SAOC encoder or decoder / transcoder. More precisely, "bsDcuFlag" = 1 means that the values "bsDcuMode" and "bsDcuParam" specified in SAOCSpecificConfig () by the SAOC encoder are applied to the DCU, while "bsDcuFlag" = 0 means that the variables "bsDcuMode" and "bsDcuP "(initialized by default values) can be further modified by the application of SAOC decoder / transcoder or user. "bsDcuMode" Sets the DCU mode. More precisely, "bsDcuMode" = 0 means that the "downmix" interpretation mode is applied by the DCU, while "bsDcuMode" = 1 that the "best performance" interpretation mode is applied by the DCU algorithm. "bsDcuParam" Defines the combination parameter value for the DCU algorithm, where the table in Figure 5b presents a quantification table for the "bsDcuParam" parameters.
The possible values "bsDcuParam" are, in this example, part of a table with 16 entries represented by 4 bits. In fact, any table, larger or smaller, could be used. The spacing between the values can be logarithmic in order to correspond to the maximum object separation in decibels. But the values could also be linearly spaced or a hybrid combination of logarithmic and linear or any other type of scale.
The parameter "bsDcuMode" in the bit stream makes it possible for the situation to choose an ideal DCU algorithm for the situation. This can be very useful, since some applications or content could benefit from the "downmix-like" interpretation mode, while others could benefit from the "best effort" interpretation mode.
Typically, the "downmix-like" interpretation mode may be the desired method for applications where retrograde / advanced compatibility is important and the downmix has important artistic qualities that need to be preserved. On the other hand, the "best effort" interpretation method can perform better in cases where this is not the case.
These DCU parameters related to the present invention could, in fact, be transmitted in any other parts of the SAOC bit stream. An alternative location would be to use the "SAOCExtensionConfig ()" container, where a given extension ID could be used. Both of these sections are located on the SAOC head, ensuring minimal data rate overhead.
Another alternative is to transmit the DCU data in the payload data (that is, in SAOCFrame ()). This would allow time-varying signaling (for example, adaptive signal control).
A flexible approach is to define the bitstream signaling of the DCU data for both the head (i.e., dynamic signaling) and the payload data (i.e., dynamic signaling). So, a SAOC encoder is free to choose one of the two signaling methods. 6.7. PROCESSING STRATEGY
In this case, if the DCU settings (for example, the DCU mode "bsDcuMode" and the combination parameter setting "bsDcuParam") are explicitly specified by the SAOC encoder (for example, "bsDcuFlag" = l), the decoder / SAOC transcoder applies these values directly to the DCU. If the DCU settings are not explicitly specified (for example, "bsDcuFlag" = 0), the SAOC decoder / transcoder uses the default values and allows the SAOC decoder / transcoder application or the user to modify them. The first quantification index (for example, idx = 0) can be used to disable the DCU. Alternatively, the default DCU value ("bsDcuParam") can be "0", that is, disabling the DCU, or "1", that is, limiting the filling. 7. PERFORMANCE ASSESSMENT 7.1. HEARING TEST PROJECT
A subjective hearing test was conducted to assess the perceptual performance of the proposed DCM concept and compared it to the results of regular SAOC RM decoding / transcoding processing. Compared to other hearing tests, the task of this test is to consider the best possible quality of reproduction in extreme interpretation situations ("ground objects", "mutation objects") in relation to two aspects of quality: 1. achieving the objective of interpretation (good attenuation / stimulation of target objects) 2. sound quality of the general scenario (considering distortions, artifacts, artificiality ...)
Note that an unmodified SAOC processing can meet the n-1 aspect, but not the n-2 aspect, while simply using the transmitted downmix signal can meet the n-2 aspect, but not the n-1 aspect.
The hearing test was conducted by presenting only real choices to the listener, that is, only material that is actually available as a signal on the decoder side. Thus, the signals presented are the output signal from the regular SAOC decoder (not processed by the DCU), demonstrating the baseline performance of the SAOC and SAOC / DCU output. In addition, the case of trivial interpretation, which corresponds to the downmix signal, is presented in the hearing test.
The table in Figure 6a describes the hearing test conditions.
Since the proposed DCU operates using regular SAOC data and downmixings and does not depend on residual information, no central encoder was applied to the corresponding SAOC downmix signals. 7.2. HEARING TEST ITEMS
The following items together with the extreme and critical interpretations were chosen for the current hearing test from the CfP hearing test material.
The table in Figure 6b describes the audio items of the hearing tests. 7.3. DOWNMIX AND INTERPRETATION SETTINGS The gains of interpretation objects that are described in a table in Figure 6c were applied to the considered upmix scenarios. 7.4. HEARING TEST INSTRUCTIONS
Subjective hearing tests were conducted in an acoustically isolated listening environment that is designed to allow high-quality hearing. Phonographic reproduction was performed using headphones (STAX SR Lambda Pro with Lake-People D / A-Converter and STAX SRM-Monitor).
The test method followed the procedure used in the spatial audio verification tests, similar to the "Multiple Stimuli with Hidden Reference and Anchors" (MUSHRA) method for the subjective evaluation of intermediate quality audio [2]. The test method was modified, as described above, in order to assess the perceptual performance of the proposed DCU. Listeners were instructed to adhere to the following hearing test instructions: "Application scenario: Imagine that you are the user of an interactive music remixing system that allows you to make dedicated remixes of music material. The system provides cursors for the mixer type for each instrument to change its level, spatial position, etc.
Due to the nature of the system, some extreme sound mixes can lead to distortion that degrades the overall sound quality. On the other hand, sound mixes with similar instrument levels tend to produce better sound quality.
The purpose of this test is to evaluate different processing algorithms in relation to their impact on the sound modification power and sound quality.
There is no "reference signal" in this test! Instead, a description of the desired sound mixes is given below.
For each audio item: - first, read the description of the desired sound mixes that you, as a system user, would like to achieve Item "BlackCoffee": Smooth metal section within the sound mix Item "VoiceOverMusic": music from soft background Item "Audition": Powerful vocal sound and soft music Item "LovePop": Section of soft strings within the sound mix - then grade the signals using a common degree to describe both - achieve the goal of interpreting the sound mix desired - sound quality of the general scenario (consider distortions, artifacts, artificiality, spatial distortions,...) "A total of 8 listeners participated in each of the tests performed. All individuals can be considered experienced listeners. The conditions of the tests were automatically randomized for each test item and for each listener. Subjective responses were recorded by a computer-based hearing test program on a scale ranging from 0 to 100, with five intervals marked in the same way, as on the MUSHRA scale. An instant exchange between the items under test was allowed. 7.5. HEARING TEST RESULTS
The graphs presented in the graphical representation of Figure 7 show the average score per item for all listeners and the average statistical value for all items evaluated together with the 95% confidence intervals associated.
The following observations can be made based on the results of the conducted hearing tests: For the conducted hearing test, the MUSHRA scores obtained prove that the DCU functionality provides significantly better performance compared to the regular SAOC RM system in the sense of general statistical average values. It should be noted that the quality of all items produced by the regular SAOC decoder (featuring powerful audio artifacts for the extreme interpretation conditions considered) is considered to be as low as the quality of the downmix-like interpretation settings that do not meet the scenario of absolutely desired interpretation. Thus, it can be concluded that the proposed DCU methods led to a considerable improvement in the subjective signal quality for all the hearing test scenarios considered. 8. CONCLUSIONS
To summarize the discussion above, the interpretation coefficient limitation schemes for the control of distortion in SAOC have been described. The achievements according to the invention can be used in combination with parametric techniques for bit rate efficient transmission / storage of audio scenarios that contain multiple audio objects, which have recently been proposed (for example, see references [1], [2], [3], [4] and [5]).
In combination with user interactivity at the receiving end, these techniques can conventionally (without using the inventive interpretation coefficient limitation scenarios) lead to poor output signal quality if extreme object interpretation is performed (see, for example, the reference [6]).
The present specification is focused on Spatial Audio Object Coding (SAOC) which provides a means for a user interface for selecting the desired phonographic reproduction configuration (eg mono, stereo, 5.1 etc.) and real time interactive modification the desired output interpretation scenario when controlling the interpretation matrix, according to personal preference or other criteria. However, the invention is also applicable to parametric techniques in general.
Due to the parametric approach based on downmax / separation / mixing, the subjective quality of the interpreted audio output depends on the interpretation parameter settings. The freedom to select the interpretation settings of the user's choice implies the risk that the user selects inappropriate object interpretation options, such as extreme gain manipulations of an object within the general sound scenario. For a commercial product, it is by all means unacceptable to produce poor sound quality and / or audio artifacts for any settings in the user interface. In order to control the excessive deterioration of the SAOC audio output produced, several computational measures have been described that are based on the idea of computing a measure of perceptual quality of the interpreted scenario, and depending on that measure (and, optionally, other information), modify the coefficients of interpretation actually applied (see, for example, reference [6]).
This document describes alternative ideas to safeguard the subjective sound quality of the interpreted SAOC scenario for which all processing is performed entirely within the SAOC decoder / transcoder, and which does not involve the explicit calculation of sophisticated measures of the perceived sound quality of the scenario interpreted sound.
These ideas can therefore be interpreted in a structurally simple and extremely efficient manner within the framework of the SAOC decoder / transcoder. The proposed distortion control unit (DCU) algorithm aims at the limitation input parameters of the SAOC decoder, namely, the interpretation coefficients.
To summarize the aforementioned, the embodiments, according to the invention, create an audio encoder, an audio decoder, an encoding method, a decoding method and computer programs to encode or decode or encoded audio signals, as described above. 9. IMPLEMENTATION ALTERNATIVES
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or an aspect of a method step. Similarly, the aspects described in the context of a method step also represent a description of a block or corresponding item or aspect of a corresponding device. Some or all of the steps in the method can be performed by (or using) a hardware device, such as a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some or more of the most important steps of the method can be performed by this device.
The inventive encoded audio signal can be stored on a digital storage medium or it can be transmitted on a transmission medium, such as a wireless transmission medium or a wired transmission medium, such as the Internet.
Depending on certain implementation requirements, the realizations of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example, a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a FROM, an EPROM, an EEPROM or a FLASH memory, with readable control signals electronically stored in it, which cooperate (or are able to cooperate) with a programmable computer system, so that the respective method is carried out. Therefore, the digital storage medium can be computer readable.
Some embodiments, according to the invention, comprise a data carrier having electronically readable control signals, which are able to cooperate with a programmable computer system, so that one of the methods described here is performed.
In general, the embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operated to perform one of the methods when the computer program product is run on a computer. The program code can, for example, be stored on a machine-readable medium.
Other achievements include the computer program to perform one of the methods described here, stored on a machine-readable medium.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code to perform one of the methods described herein, when the computer program is executed on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (either a digital storage medium or a computer-readable medium) comprising, recorded on it, the computer program for carrying out one of the methods described herein. The data medium, the digital storage medium or the recorded medium are typically tangible and / or non-transitory.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program to perform one of the methods described herein. The data stream or signal sequence can, for example, be configured to be transferred via a data communication connection, for example, via the Internet.
An additional embodiment comprises a processing means, for example, a computer or a programmable logic device configured or adapted to carry out one of the methods described herein.
A further embodiment comprises a computer having the computer program installed on it to carry out one of the methods described herein.
In some embodiments, a programmable logic device (for example, a programmable field logic matrix) can be used to perform some or all of the functionality of the methods described here. In some embodiments, a programmable field logic matrix can cooperate with a microprocessor in order to perform one of the methods described here. In general, the methods are preferably performed by any hardware device.
The embodiments described above are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the provisions and details described herein will be apparent to those skilled in the art. It is intended, therefore, to be limited only to the scope of the impending patent claims and not to the specific details presented by way of description and explanation of the achievements here.
REFERENCES [1] C. Faller and F. Baumgarte, "Binaural Cue Coding - Part II: Schemes and applications", IEEE Trans, on Speech and Audio Proc., Vol. 11, no. 6, Nov. 2003. [2] C. Faller, "Parametric Joint-Coding of Audio Sources", 120th AES Convention, Paris, 2006, Preprint 6752. [3] J. Herre, S. Disch, J. Hilpert, 0 Hellmuth: "From SAC To SAOC - Recent Developments in Parametric Coding of Spatial Audio", 22nd Regional UK AES Conference, Cambridge, UK, April 2007. [4] J. Engdegârd, B. Resch, C. Falch, 0. Hellmuth , J. Hilpert, A. Holzer, L. Terentiev, J. Breebaart, J. Koppens, E. Schuijers and W. Oomen: "Spatial Audio Object Coding (SAOC) - The Upcoming MPEG Standard on Parametric Object Based Audio Coding", 124th AES Convention, Amsterdam 2008, Preprint 5 7377. [5] ISO / IEC, "MPEG audio technologies - Part 2: Spatial Audio Object Coding (SAOC)," ISO / IEC JTC1 / SC29 / WG11 (MPEG) FCD 23003-2 . [6] US patent application 61 / 173,456, METHODS, 10 APPARATUS, AND COMPUTER PROGRAMS FOR DISTORTION AVOIDING AUDIO SIGNAL PROCESSING [7] EBU Technical recommendation: "MUSHRA-EBU Method for Subjective Listening Tests of Intermediate Audio Quality", Doc. B / AIM022, October 1999. 15 [8] ISO / IEC JTC1 / SC29 / WG11 (MPEG), Document N10843, "Study on ISO / IEC 23003-2: 200x Spatial Audio Object Coding (SAOC)", 89th MPEG Meeting, London, UK, July 2009.
权利要求:
Claims (20)
[0001]
1. AUDIO PROCESSING DEVICE (100; 200) TO PROVIDE A UPMIX SIGNAL REPRESENTATION (130; 230) BASED ON A DOWNMIX SIGNAL REPRESENTATION (110; 210) AND PARAMETRIC INFORMATION RELATED TO THE OBJECT, which are included in a bitstream representation (300) of an audio content, and depending on an interpretation matrix specified by the user (144, Mren) that defines a desired contribution of a plurality of audio objects to one, two or more channels of output audio, the device is characterized by comprising: a distortion limiter (140; 240) configured to obtain a modified interpretation matrix (142; Mreíllim) using a linear combination of a user-specified interpretation matrix (Mren) and a distortion-free target interpretation matrix (M) depending on a parameter of ren, tar linear combination (146; gDCU); and a signal processor (148; 248) configured to obtain an upmix signal representation based on the downmix signal representation and the object-related parametric information using the modified interpretation matrix; wherein the apparatus is configured to evaluate a bit stream element (306; bsDcuParameter) that represents the linear combination parameter (146; gDCU) in order to obtain the linear combination parameter.
[0002]
2. APPARATUS (100; 200), according to claim 1, characterized in that the distortion limiter is configured to obtain the target interpretation matrix (MreBíar) so that the target interpretation matrix is a target interpretation matrix free of distortion.
[0003]
Apparatus (100; 200) according to claim 1 or claim 2, characterized in that the distortion limiter is configured to obtain the modified interpretation matrix Mzr '™ lim according to:
[0004]
Apparatus (100; 200) according to one of claims 1 to 3, characterized in that the distortion limiter is configured to obtain the target interpretation matrix (ürentar) so that the target interpretation matrix is an interpretation matrix downmix-like target.
[0005]
Apparatus (100; 200) according to one of claims 1 to 4, characterized in that the distortion limiter is configured to scale an extended downmix matrix (DzflS) using an energy normalization scalar (y] NlDS |, to obtain the target interpretation matrix (Mren, tar) r where the extended downmix matrix is an extended version of a downmix matrix, one or more rows of that downmix matrix describe contributions from a plurality of audio object signals to one or more channels of the downmix signal representation, extended by the rows of zero elements, so that several rows of the extended downmix matrix are identical to a constellation of interpretations described by the interpretation matrix specified by the user (Mr ,, r).
[0006]
Apparatus (100; 200) according to one of claims 1 to 3, characterized in that the distortion limiter is configured to obtain the target interpretation matrix (Mren, tar) r so that the target interpretation matrix is a interpretation matrix targeted for the best commitment.
[0007]
Apparatus (100; 200) according to one of claims 1 to 3 or 6, characterized in that the distortion limiter is configured to obtain the target interpretation matrix (Mre „; tor), so that the interpretation matrix target depends on a downmix matrix (D) and the interpretation matrix specified by the user (M / v, r).
[0008]
Apparatus (100; 200) according to one of claims 1 to 3, 6 or 7, wherein the distortion limiter is configured to compute a matrix (NB £) characterized by comprising individual channel energy normalization values for a plurality of audio output channels from the device to provide an upmix signal representation, so that an energy normalization value for a given audio output channel from the device describes, at least approximately, a ratio between a sum of the energy interpretation values associated with the given output audio channel in the interpretation matrix specified by the user for a plurality of audio objects and a sum of energy downmix values for the plurality of audio objects; and where the distortion limiter is configured to scale a set of downmix values using an individual energy normalization value per channel, to obtain a set of interpretation values from the target interpretation matrix (ürentar ') associated with the given channel of exit.
[0009]
Apparatus (100; 200) according to one of claims Ia3e6a8, in which the distortion limiter is configured to compute a matrix () characterized by comprising the individual energy normalization values per channel for a plurality of audio channels exit, according to:
[0010]
Apparatus (100; 200) according to one of claims 1 to 3 or 6 to 7, characterized in that the distortion limiter is configured to compute a matrix that describes an individual energy normalization per channel for a plurality of channels of audio output from the device depending on the interpretation matrix specified by the user (Mre „) and a downmix matrix D; and where the distortion limiter is configured to apply the matrix that describes the individual energy normalization by channel to obtain a set of interpretation coefficients of the target interpretation matrix (üren tar) associated with a given audio output channel of the device as a linear combination of sets of downmix values associated with the different channels of the downmix signal representation.
[0011]
Apparatus (100; 200) according to one of claims 1 to 3 or 6 to 7 or 10, characterized in that the distortion limiter is configured to compute a matrix that describes the individual energy normalization per channel for a plurality of output audio channels, according to:
[0012]
Apparatus (100; 200) according to claims 1 to 3 or 6 to 7 or 10, characterized in that the distortion limiter is configured to compute an Ng £ matrix according to
[0013]
13. Apparatus (100; 200) according to one of claims 1 to 3 or 6 to 7, characterized in that the distortion limiter is configured to compute an energy normalization scalar Ng £ according to
[0014]
Apparatus (100; 200) according to one of claims 1 to 13, characterized in that the apparatus is configured to read an index value (idx) which represents a linear combination parameter (gDCU) from the flow representation bits of audio content and to map the index value to the linear combination parameter (gDCU) using a parameter quantization table.
[0015]
15. APPARATUS (100; 200), according to claim 14, characterized in that the quantification table describes a non-uniform quantification, in which smaller values of the linear combination parameter (gDCU), which describe a stronger contribution of the matrix of interpretation specified by the user (Mre (j) in the modified interpretation matrix (Mre / jlim), are quantified with a higher resolution.
[0016]
16. Apparatus (100; 200) according to one of claims 1 to 15, characterized in that the apparatus is configured to evaluate a bitstream element (bsDcuMode) which describes a distortion limiting mode, and in which the limiter Distortion matrix is configured to selectively obtain the target interpretation matrix so that a target interpretation matrix is a downmix-like target interpretation matrix, or so that a target interpretation matrix is a best-performing target interpretation matrix.
[0017]
17. APPLIANCE (150) TO PROVIDE A BIT FLOW (170) REPRESENTING A MULTI-CHANNEL AUDIO SIGNAL, the device is characterized by comprising: a downmixer (180) configured to provide a downmix signal (182) based on a plurality audio object signals (160a-160N); a parallel information provider (184) configured to provide object-related parametric parallel information (186) that describes characteristics of the audio object signals (160a-160N) and downmix parameters, and a linear combination parameter (188) describing desired contributions from a user-specified interpretation matrix (Mre „) and a target interpretation matrix (Hrentar) to a modified interpretation matrix (Mreíjlim) to be used by an apparatus (100; 200) to provide a representation upmix signal based on the bit stream; and a bit stream formatter (190) configured to provide a bit stream (170) comprising a representation of the downmix signal, the object-related parametric parallel information and the linear combination parameter.
[0018]
18. AUDIO PROCESSING METHOD FOR PROVIDING AN UPMIX SIGNAL REPRESENTATION BASED ON A DOWNMIX SIGNAL REPRESENTATION AND PARAMETRIC INFORMATION RELATED TO THE OBJECT, which are included in a bitstream representation of an audio content and a dependency on an interpretation matrix specified by the user that defines a desired contribution from a plurality of audio objects to one, two or more output audio channels, the method is characterized by comprising: evaluation of a bit stream element that represents a parameter linear combination, in order to obtain the linear combination parameter; obtaining a modified interpretation matrix using a linear combination of a user-specified interpretation matrix and a distortion-free target interpretation matrix depending on the linear combination parameter; and obtaining the upmix signal representation based on the downmix signal representation and the parametric information related to the object using the modified interpretation matrix.
[0019]
19. METHOD FOR PROVIDING A BIT FLOW THAT REPRESENTS A MULTI-CHANNEL AUDIO SIGNAL, the method is characterized by comprising: provision of a downmix signal based on a plurality of audio object signals; provision of parametric parallel information related to the object that describes characteristics of the audio object signals and downmix parameters, and a linear combination parameter that describes desired contributions from a user-specified interpretation matrix and a target interpretation matrix to a modified interpretation matrix; and providing a bit stream comprising a representation of the downmix signal, the object-related parametric parallel information and the linear combination parameter; wherein the user-specified interpretation matrix defines a desired contribution from a plurality of audio objects to one, two or more output audio channels.
[0020]
20. BIT FLOW (3 00) REPRESENTING A MULTI-CHANNEL AUDIO SIGNAL, the bit stream is characterized by comprising: a representation (302) of a downmix signal that combines audio signals from a plurality of audio objects: one parametric information related to the object (304) that describes characteristics of the audio objects; and a linear combination parameter (306) that describes desired contributions from a user-specified interpretation matrix and a target interpretation matrix to a modified interpretation matrix.
类似技术:
公开号 | 公开日 | 专利标题
BR112012012097B1|2021-01-05|apparatus for providing an upmix signal representation based on the downmix signal representation, apparatus for providing a bit stream representing a multichannel audio signal, methods and bit stream representing a multichannel audio signal using a linear combination parameter
JP5554830B2|2014-07-23|Device for supplying one or more adjusted parameters for the provision of an upmix signal representation based on a downmix signal representation, an audio signal decoder using object-related parametric information, an audio signal transcoder, an audio signal Encoder, audio bitstream, method and computer program
ES2529219T3|2015-02-18|Apparatus for providing a representation of upstream signal based on the representation of a downlink signal, apparatus for providing a bitstream representing a multi-channel audio signal, methods, computer program and a bitstream which uses a distortion control signaling
KR101742137B1|2017-05-31|Decoder, encoder and method for informed loudness estimation employing by-pass audio object signals in object-based audio coding systems
JP5758902B2|2015-08-05|Apparatus, method, and computer for providing one or more adjusted parameters using an average value for providing a downmix signal representation and an upmix signal representation based on parametric side information related to the downmix signal representation program
BRPI1005299B1|2020-11-24|apparatus and method to perform the upmmix on a downmix audio signal
PT2483887T|2017-10-23|Mpeg-saoc audio signal decoder, method for providing an upmix signal representation using mpeg-saoc decoding and computer program using a time/frequency-dependent common inter-object-correlation parameter value
TW201118860A|2011-06-01|Apparatus, method and computer program for upmixing a downmix audio signal using a phase value smoothing
BR112012008921B1|2021-11-16|MECHANISM AND METHOD FOR PROVIDING ONE OR MORE ADJUSTED PARAMETERS FOR THE PROVISION OF AN UPMIX SIGNAL REPRESENTATION BASED ON A DOWNMIX SIGNAL REPRESENTATION AND A PARAMETRIC SIDE INFORMATION ASSOCIATED WITH THE DOWNMIX SIGNAL REPRESENTATION, USING AN AVERAGE
BR112012007138B1|2021-11-30|AUDIO SIGNAL DECODER, AUDIO SIGNAL ENCODER, METHOD FOR PROVIDING UPLOAD SIGNAL MIXED REPRESENTATION, METHOD FOR PROVIDING DOWNLOAD SIGNAL AND BITS FLOW REPRESENTATION USING A COMMON PARAMETER VALUE OF INTRA-OBJECT CORRELATION
BR112014010062B1|2021-12-14|AUDIO OBJECT ENCODER, AUDIO OBJECT DECODER, AUDIO OBJECT ENCODING METHOD, AND AUDIO OBJECT DECODING METHOD
同族专利:
公开号 | 公开日
EP2489038B1|2016-01-13|
KR20120084314A|2012-07-27|
CN102714038B|2014-11-05|
US8571877B2|2013-10-29|
PL2489038T3|2016-07-29|
RU2607267C2|2017-01-10|
JP2013511738A|2013-04-04|
MY154641A|2015-07-15|
US20120259643A1|2012-10-11|
RU2012127554A|2013-12-27|
BR112012012097A2|2017-12-12|
CA2781310A1|2011-05-26|
ES2569779T3|2016-05-12|
KR101414737B1|2014-07-04|
TW201131553A|2011-09-16|
JP5645951B2|2014-12-24|
AU2010321013A1|2012-07-12|
WO2011061174A1|2011-05-26|
MX2012005781A|2012-11-06|
EP2489038A1|2012-08-22|
TWI441165B|2014-06-11|
AU2010321013B2|2014-05-29|
CN102714038A|2012-10-03|
CA2781310C|2015-12-15|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

BR0304540A|2002-04-22|2004-07-20|Koninkl Philips Electronics Nv|Methods for encoding an audio signal, and for decoding an encoded audio signal, encoder for encoding an audio signal, apparatus for providing an audio signal, encoded audio signal, storage medium, and decoder for decoding an audio signal. encoded audio|
US8843378B2|2004-06-30|2014-09-23|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Multi-channel synthesizer and method for generating a multi-channel output signal|
KR100663729B1|2004-07-09|2007-01-02|한국전자통신연구원|Method and apparatus for encoding and decoding multi-channel audio signal using virtual source location information|
CN101138274B|2005-04-15|2011-07-06|杜比国际公司|Envelope shaping of decorrelated signals|
CN102693727B|2006-02-03|2015-06-10|韩国电子通信研究院|Method for control of randering multiobject or multichannel audio signal using spatial cue|
WO2007111568A2|2006-03-28|2007-10-04|Telefonaktiebolaget L M Ericsson |Method and arrangement for a decoder for multi-channel surround sound|
EP2038878B1|2006-07-07|2012-01-18|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for combining multiple parametrically coded audio sources|
MX2009003564A|2006-10-16|2009-05-28|Fraunhofer Ges Forschung|Apparatus and method for multi -channel parameter transformation.|
EP2054875B1|2006-10-16|2011-03-23|Dolby Sweden AB|Enhanced coding and parameter representation of multichannel downmixed object coding|
JP5209637B2|2006-12-07|2013-06-12|エルジーエレクトロニクスインコーポレイティド|Audio processing method and apparatus|
EP2595149A3|2006-12-27|2013-11-13|Electronics and Telecommunications Research Institute|Apparatus for transcoding downmix signals|
US20100119073A1|2007-02-13|2010-05-13|Lg Electronics, Inc.|Method and an apparatus for processing an audio signal|
MX2008013073A|2007-02-14|2008-10-27|Lg Electronics Inc|Methods and apparatuses for encoding and decoding object-based audio signals.|
KR101244545B1|2007-10-17|2013-03-18|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베.|Audio coding using downmix|
KR101024924B1|2008-01-23|2011-03-31|엘지전자 주식회사|A method and an apparatus for processing an audio signal|
ES2753899T3|2008-03-04|2020-04-14|Fraunhofer Ges Forschung|Mixing inbound data streams and generating an outbound data stream from them|
US8315396B2|2008-07-17|2012-11-20|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Apparatus and method for generating audio output signals using object based metadata|MX2011011399A|2008-10-17|2012-06-27|Univ Friedrich Alexander Er|Audio coding using downmix.|
CN104822036B|2010-03-23|2018-03-30|杜比实验室特许公司|The technology of audio is perceived for localization|
US10158958B2|2010-03-23|2018-12-18|Dolby Laboratories Licensing Corporation|Techniques for localized perceptual audio|
KR20120071072A|2010-12-22|2012-07-02|한국전자통신연구원|Broadcastiong transmitting and reproducing apparatus and method for providing the object audio|
TWI543642B|2011-07-01|2016-07-21|杜比實驗室特許公司|System and method for adaptive audio signal generation, coding and rendering|
ES2638391T3|2012-08-10|2017-10-20|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Encoder, decoder, system and procedure that employs a residual concept for parametric coding of an audio object|
EP2717265A1|2012-10-05|2014-04-09|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Encoder, decoder and methods for backward compatible dynamic adaption of time/frequency resolution in spatial-audio-object-coding|
US10068579B2|2013-01-15|2018-09-04|Electronics And Telecommunications Research Institute|Encoding/decoding apparatus for processing channel signal and method therefor|
WO2014112793A1|2013-01-15|2014-07-24|한국전자통신연구원|Encoding/decoding apparatus for processing channel signal and method therefor|
EP2804176A1|2013-05-13|2014-11-19|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio object separation from mixture signal using object-specific time/frequency resolutions|
AU2014270299B2|2013-05-24|2017-08-10|Dolby International Ab|Coding of audio scenes|
EP2973551B1|2013-05-24|2017-05-03|Dolby International AB|Reconstruction of audio scenes from a downmix|
CN105229732B|2013-05-24|2018-09-04|杜比国际公司|The high efficient coding of audio scene including audio object|
US9818412B2|2013-05-24|2017-11-14|Dolby International Ab|Methods for audio encoding and decoding, corresponding computer-readable media and corresponding audio encoder and decoder|
CN110085240A|2013-05-24|2019-08-02|杜比国际公司|The high efficient coding of audio scene including audio object|
TWM487509U|2013-06-19|2014-10-01|杜比實驗室特許公司|Audio processing apparatus and electrical device|
KR102243395B1|2013-09-05|2021-04-22|한국전자통신연구원|Apparatus for encoding audio signal, apparatus for decoding audio signal, and apparatus for replaying audio signal|
EP3044876B1|2013-09-12|2019-04-10|Dolby Laboratories Licensing Corporation|Dynamic range control for a wide variety of playback environments|
CN105659320B|2013-10-21|2019-07-12|杜比国际公司|Audio coder and decoder|
EP3069528B1|2013-11-14|2017-09-13|Dolby Laboratories Licensing Corporation|Screen-relative rendering of audio and encoding and decoding of audio for such rendering|
JP6439296B2|2014-03-24|2018-12-19|ソニー株式会社|Decoding apparatus and method, and program|
WO2015150384A1|2014-04-01|2015-10-08|Dolby International Ab|Efficient coding of audio scenes comprising audio objects|
WO2015183060A1|2014-05-30|2015-12-03|삼성전자 주식회사|Method, apparatus, and computer-readable recording medium for providing audio content using audio object|
CN105227740A|2014-06-23|2016-01-06|张军|A kind of method realizing mobile terminal three-dimensional sound field auditory effect|
TWI587286B|2014-10-31|2017-06-11|杜比國際公司|Method and system for decoding and encoding of audio signals, computer program product, and computer-readable medium|
CN105989845B|2015-02-25|2020-12-08|杜比实验室特许公司|Video content assisted audio object extraction|
EA202090186A3|2015-10-09|2020-12-30|Долби Интернешнл Аб|AUDIO ENCODING AND DECODING USING REPRESENTATION CONVERSION PARAMETERS|
法律状态:
2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2019-09-03| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2020-07-21| B07A| Application suspended after technical examination (opinion) [chapter 7.1 patent gazette]|
2020-11-03| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-01-05| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 10 (DEZ) ANOS CONTADOS A PARTIR DE 05/01/2021, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US26304709P| true| 2009-11-20|2009-11-20|
US61/263,047|2009-11-20|
US36926110P| true| 2010-07-30|2010-07-30|
US61/369,261|2010-07-30|
EP10711452.5|2010-07-30|
EP10171452|2010-07-30|
PCT/EP2010/067550|WO2011061174A1|2009-11-20|2010-11-16|Apparatus for providing an upmix signal representation on the basis of the downmix signal representation, apparatus for providing a bitstream representing a multi-channel audio signal, methods, computer programs and bitstream representing a multi-channel audio signal using a linear combination parameter|
[返回顶部]